Most of the conversation in this Expert Circle focuses on whether to use agents, which placement pattern to apply, or how to configure tools and guardrails. All very useful. But there’s a layer underneath all of that which breaks production deployments more than any of those decisions: what happens to context between agents in a multi-step workflow. I want to be specific, because this is not a theoretical problem.
What context loss actually looks like in a real workflow
Let’s take a purely agentic dynamic workflow. You build a three-agent workflow. The Agent 1 reads an incoming document, extracts structured and unstructured data and passes it to the Agent 2 for a compliance evaluation. Then, Agent 2 produces a recommendation and hands it off to Agent 3, who drafts the next outcome.
In isolation, each agent works fine, tested and all good. In the demo, the flow looks clean and powerful.
Once we move this into production, Agent 3 writes a communication that ignores a key caveat from Agent 1 that Agent 2 never forwarded. The customer gets incorrect information. Nobody logged why and that’s almost impossible to trace.
This is the context problem. It is not only a model quality issue. It is an architectural issue, and it is one of the most common failure modes I see when enterprises move agentic workflows beyond proof of concept.
Two patterns that help before the theory
When you design a multi-agent workflow in Pega, the quality of your case data model directly determines the quality of your agent execution. Agents are only as good as the context they can read and write. And remember, an enterprise does not live in sessions, it lives in states. And the state is the case. From this perspective, two things I’ve seen make a real difference in the field:
Define agent “read boundaries” explicitly. Don’t let agents have unrestricted access to the entire data model. Structure what each agent can see and write. This limits prompt size (consider the token cost and quadratic increase per step Brian discussed in this great article Understanding Your Agentic Architectures Cost Impact ), reduces hallucination surface, and makes debugging dramatically easier.
Treat agent outputs as structured case data, not free text. If Agent 2 produces a compliance recommendation, map it to a typed property with an outcome value, not raw text. Use a Pega GenAI Connect rule to map from raw text to fields. Subsequent agents and human reviewers work with these values, not with parsing an LLM’s prose.
These two patterns sound simple. In practice, they require upfront case design discipline that many teams skip in the rush to get the agent working.
Why it’s harder than it looks in pure-agentic frameworks
Agents are stateless by design. Each LLM call is a fresh context window. What gets “remembered” between steps is what you, as the architect, explicitly pass forward.
In frameworks like LangGraph, AutoGen, or CrewAI, this means you are managing a shared memory object yourself: deciding what to include, what to prune, and what risks being lost in summarization. When the chain is long, information degrades (consider the table below with degradation per context window). This is where I consistently see the gap between “agentic” and “enterprise-ready.”
The practical implication is not the exact limits; it’s the degradation pattern
Context windows & degradation behavior by model
How far each underlying model can be pushed before reasoning quality starts to drop and what that degradation looks like in agent runs.
| Model underlying | Context window (tokens) |
Typical degradation start | Agent behavior near the limit |
|---|---|---|---|
| Pega-Default-Fast Claude Haiku 4.5 | ~200K | ~40K–70K | Starts dropping mid-context facts and shortening reasoning chains; higher hallucination risk. |
| Pega-Default-Smart Claude Sonnet 4.6 | ~200K up to 1M (beta) | ~60K–120K | Good long-context retention, but “middle blindness” appears and the agent becomes less reliable. |
| Claude Opus 4.6 | 200K-1M (beta) | ~100K–200K | Best long-context reasoning of the set; still degrades gradually across all lengths. |
| Gemini 3.5 Flash | ~1M | ~100K–300K | Strong at scale, but loses precision in dense context; needs structure to avoid drift. |
| Gemini Flash older · 1.5 / 2.5 family | 200K–1M | ~50K–200K | Fast but less stable reasoning; more sensitive to noisy context. |
| GPT-5 / Mini / Nano | ~128K–400K | ~40K–120K | Degrades predictably; tends to compress reasoning rather than hallucinate early. |
| GPT-5.1 / GPT-5.5 | ~400K-1M | ~80K–200K | Strong structured reasoning, but context dilution affects multi-step workflows. |
| GPT-4 / 4o family | ~128K | ~30K–80K | Noticeable loss of earlier instructions; agents become reactive rather than planned. |
| Nova Premier | ~1M | ~150K–300K | Very large ingestion, but guidance warns performance declines as size grows. |
| Nova Pro | ~300K | ~60K–150K | Balanced, but less evidence of strong long-context reasoning. |
| Nemotron Super 3 120B | ~256K–1M | ~80K–250K | Strong long-context retrieval; still shows gradual degradation across length. |
Note: Token figures are approximate and vary by configuration, provider tier, and release. “Degradation start” marks where reasoning quality typically begins to slip — not a hard cutoff.
Sources:
Context windows - Claude API Docs , Claude Sonnet 4.5 - Amazon Bedrock , Google models | Gemini Enterprise Agent Platform | Google Cloud Documentation , https://ai.google.dev/gemini-api/docs/whats-new-gemini-3.5 , GPT-5.1 Model | OpenAI API , https://build.nvidia.com/nvidia/nemotron-3-super-120b-a12b/modelcard, NVIDIA Nemotron 3 Super 120B - Amazon Bedrock , Utilizing long context windows - Amazon Nova , https://assets.amazon.science/e5/e6/ccc5378c42dca467d1abe1628ec9/amazon-nova-premier-technical-report-and-model-card.pdf
Regarding values on “Typical Degradations Start” ranges, these are observed effective ranges based on research plus field behavior, not vendor-guaranteed limits, like Context Rot: Why LLMs Degrade as Context Grows (Complete Guide) | Morph and Context Rot: How Increasing Input Tokens Impacts LLM Performance | Chroma
Agent-specific failure modes
In a nutshell, when context grows, agents’ behavior becomes unreliable:
| Stage | Behavior |
| Early (0–30%) | In general, consistent outcomes |
| Mid (30–60%) | Some drift observed, plus missed dependencies |
| Late (60–90%)| Confusion and contradictory reasoning |
| Near limit | Random recall, hallucinations and loss of plan |
Where Pega’s architecture helps
Pega doesn’t solve this problem through agent design alone. It solves it through case design.
The case is the persistent context store, the state. Every agent, whether a Step Agent, an Application Agent, or an external agent called via A2A, operates on a shared case state. The data model is defined. The clipboard is structured. The case history and audit are immutable. What Agent 1 writes into the case is available to Agent 3 without explicit forwarding logic, because the case is the wire.
This matters for three reasons:
• Context can’t be silently lost across agent handoffs, because the authoritative state lives in the case, not in a conversation thread that each agent has to forward correctly.
• Auditability is structural, not bolted on. You don’t need to log context passing; the case history is the log. In the regulated market, this is a mandate.
• Humans can intervene at any point in the workflow without breaking continuity, because the state isn’t locked inside an agent chain.
Most pure-agentic frameworks treat context passing as a prompt engineering problem. Pega treats it as a data architecture problem. That is the right framing for regulated enterprise environments and organizations
The broader point
The industry conversation about agentic AI is still largely about capability. I see this daily: what agents can do. The harder, less glamorous conversation is about context integrity: what agents reliably know at each step of a multi-task, multi-agent workflow.
Pega’s case model is an architectural answer to that problem. It’s worth making that argument explicitly with clients, because most of them won’t arrive at it on their own. Context is not a side concern in agentic systems. It’s the system itself.
Curious whether others have hit this in delivery. What patterns have you found effective for managing context across complex agent chains?
