Category: Engineering

  • Memory in AI Agents: Why It Matters More Than You Think

    Engineering

    Memory in AI Agents: Why It Matters More Than You Think

    The demos always look the same. A user types a question. The agent answers. Everyone nods. What the demos don’t show is what happens in the second conversation, or the fiftieth, when the agent has no idea who it’s talking to or what happened before.

    Memory is one of those things that seems optional until you try to ship something real. Then it becomes the problem you can’t ignore.

    Why context degrades without memory

    A language model’s context window is a finite resource. In a long conversation, you either pay to include more history (expensive, and eventually impossible) or you truncate (and lose information the agent needs). Neither is a satisfying answer.

    The deeper issue is that even within a conversation, different kinds of information need to be managed differently. Facts the user told you in session one (“I’m on the Pro plan”) have different lifetimes than temporary working state (“I’m currently helping them with order #4821”).

    Three layers of memory

    The way we think about it at Korelos, there are three distinct layers:

    • Session memory: The current conversation. Active while the session is open, discarded after. Used for tracking intermediate state and the immediate context of what’s being discussed.
    • User memory: Persistent facts about a specific user. Their preferences, their plan tier, their history with your product. Persists across sessions. Scoped to the user identifier you provide.
    • Global memory: Shared context that applies to all interactions. Your product FAQ. Business rules. Things that don’t change per-user but that every conversation might need.

    How Korelos handles this

    When you deploy an agent on Korelos, you configure a memory policy as part of the agent definition. You decide which layers are active, what the retention window is, and what gets summarised vs. stored verbatim. The platform handles the mechanics: retrieving relevant context at the start of each turn, updating user memory when the agent extracts persistent facts, and purging session state when conversations close.

    None of this requires you to write any memory management code. It’s part of the infrastructure that Korelos handles by default.

    The thing most people get wrong

    The most common mistake is treating memory as a retrieval problem and ignoring the write side. Getting the right context into the model matters, but so does deciding what to write to long-term memory in the first place. If you store everything, retrieval becomes noisy. If you store nothing, context is lost.

    Our agents use a lightweight extraction step at the end of each session to decide what, if anything, is worth persisting. You can configure what types of facts to extract or write your own extraction logic. Most teams start with the defaults and tune from there.

  • The Three-Layer Stack Every Production AI Agent Needs

    Engineering

    The Three-Layer Stack Every Production AI Agent Needs

    If you’ve shipped more than one AI agent to production, you’ve probably noticed something: the architecture diagrams on day one and day ninety look almost nothing alike. Day one is a single endpoint that wraps a model call. Day ninety is a sprawl of retry loops, memory stores, tool wrappers, observability hooks, and prompt-versioning hacks duct-taped together by whoever was on call at 3 a.m.

    After eighteen months of watching teams iterate on this exact arc, we’ve come to believe the same three-layer pattern always emerges. The teams that get there fast win. The teams that get there slow burn out and blame the model.

    Layer 1: The agent runtime

    This is the layer that takes a goal and a context, decides what to do next, and either calls a tool or produces a response. It is not just a model call. It’s a control loop with budgets (tokens, tool calls, wall-clock time), step-level fallbacks, and structured outputs.

    Common mistake: pushing all the logic into the prompt. The prompt is one input to this layer, not the layer itself. Anything stateful, anything that needs to be debugged, anything that needs metrics — that belongs in the runtime, not the prompt.

    Layer 2: The tool layer

    Tools are how the agent affects the outside world. The hard parts here are not the tools themselves; the hard parts are the contracts. Each tool needs a schema the model can target accurately, an executor that handles transient failures cleanly, and a result format that’s compact enough to fit back into the next agent step without blowing the context window.

    We’ve seen teams build their tool layer as a thin wrapper around their existing API surface. That works for two weeks. Then they discover that the schemas LLMs work well with are not the schemas humans wrote for REST endpoints in 2018, and they have to redo the whole thing.

    Layer 3: Memory

    Memory is where production agents diverge from demos most dramatically. A demo can fit everything in the prompt. A production agent supporting thousands of conversations cannot.

    The memory layer needs at least three scopes: per-session (this conversation), per-user (everything we know about this person), and global (organization-level facts). Each scope has a different lifetime, different retrieval strategy, and different write semantics. Treating them as one big bucket is the most common architectural mistake we see.

    If your memory layer is “stuff the last 20 messages into the prompt,” you have a demo, not a system. The day a customer comes back after a week and asks “remember when we…” is the day you find out.

    Why we built Korelos around this

    Every agent we’ve helped teams ship needed all three layers. Every team that tried to skip one paid for it later. Korelos exists because we got tired of watching smart engineers rebuild the same scaffolding for the fifth time. The runtime, tool layer, and memory layer are the platform — what you bring is the goal.