The case is the wire: why multi-agent context integrity is an architecture decision

Most of the conversation in this Expert Circle focuses on whether to use agents, which placement pattern to apply, or how to configure tools and guardrails. All very useful. But there’s a layer underneath all of that which breaks production deployments more than any of those decisions: what happens to context between agents in a multi-step workflow. I want to be specific, because this is not a theoretical problem.

What context loss actually looks like in a real workflow

Let’s take a purely agentic dynamic workflow. You build a three-agent workflow. The Agent 1 reads an incoming document, extracts structured and unstructured data and passes it to the Agent 2 for a compliance evaluation. Then, Agent 2 produces a recommendation and hands it off to Agent 3, who drafts the next outcome.
In isolation, each agent works fine, tested and all good. In the demo, the flow looks clean and powerful.
Once we move this into production, Agent 3 writes a communication that ignores a key caveat from Agent 1 that Agent 2 never forwarded. The customer gets incorrect information. Nobody logged why and that’s almost impossible to trace.
This is the context problem. It is not only a model quality issue. It is an architectural issue, and it is one of the most common failure modes I see when enterprises move agentic workflows beyond proof of concept.

Two patterns that help before the theory

When you design a multi-agent workflow in Pega, the quality of your case data model directly determines the quality of your agent execution. Agents are only as good as the context they can read and write. And remember, an enterprise does not live in sessions, it lives in states. And the state is the case. From this perspective, two things I’ve seen make a real difference in the field:
Define agent “read boundaries” explicitly. Don’t let agents have unrestricted access to the entire data model. Structure what each agent can see and write. This limits prompt size (consider the token cost and quadratic increase per step Brian discussed in this great article Understanding Your Agentic Architectures Cost Impact ), reduces hallucination surface, and makes debugging dramatically easier.
Treat agent outputs as structured case data, not free text. If Agent 2 produces a compliance recommendation, map it to a typed property with an outcome value, not raw text. Use a Pega GenAI Connect rule to map from raw text to fields. Subsequent agents and human reviewers work with these values, not with parsing an LLM’s prose.
These two patterns sound simple. In practice, they require upfront case design discipline that many teams skip in the rush to get the agent working.

Why it’s harder than it looks in pure-agentic frameworks

Agents are stateless by design. Each LLM call is a fresh context window. What gets “remembered” between steps is what you, as the architect, explicitly pass forward.
In frameworks like LangGraph, AutoGen, or CrewAI, this means you are managing a shared memory object yourself: deciding what to include, what to prune, and what risks being lost in summarization. When the chain is long, information degrades (consider the table below with degradation per context window). This is where I consistently see the gap between “agentic” and “enterprise-ready.”
The practical implication is not the exact limits; it’s the degradation pattern

Context windows & degradation behavior by model

How far each underlying model can be pushed before reasoning quality starts to drop and what that degradation looks like in agent runs.

Model underlying Context window
(tokens)
Typical degradation start Agent behavior near the limit
Pega-Default-Fast Claude Haiku 4.5 ~200K ~40K–70K Starts dropping mid-context facts and shortening reasoning chains; higher hallucination risk.
Pega-Default-Smart Claude Sonnet 4.6 ~200K up to 1M (beta) ~60K–120K Good long-context retention, but “middle blindness” appears and the agent becomes less reliable.
Claude Opus 4.6 200K-1M (beta) ~100K–200K Best long-context reasoning of the set; still degrades gradually across all lengths.
Gemini 3.5 Flash ~1M ~100K–300K Strong at scale, but loses precision in dense context; needs structure to avoid drift.
Gemini Flash older · 1.5 / 2.5 family 200K–1M ~50K–200K Fast but less stable reasoning; more sensitive to noisy context.
GPT-5 / Mini / Nano ~128K–400K ~40K–120K Degrades predictably; tends to compress reasoning rather than hallucinate early.
GPT-5.1 / GPT-5.5 ~400K-1M ~80K–200K Strong structured reasoning, but context dilution affects multi-step workflows.
GPT-4 / 4o family ~128K ~30K–80K Noticeable loss of earlier instructions; agents become reactive rather than planned.
Nova Premier ~1M ~150K–300K Very large ingestion, but guidance warns performance declines as size grows.
Nova Pro ~300K ~60K–150K Balanced, but less evidence of strong long-context reasoning.
Nemotron Super 3 120B ~256K–1M ~80K–250K Strong long-context retrieval; still shows gradual degradation across length.

Note: Token figures are approximate and vary by configuration, provider tier, and release. “Degradation start” marks where reasoning quality typically begins to slip — not a hard cutoff.

Sources:
Context windows - Claude API Docs , Claude Sonnet 4.5 - Amazon Bedrock , Google models  |  Gemini Enterprise Agent Platform  |  Google Cloud Documentation , https://ai.google.dev/gemini-api/docs/whats-new-gemini-3.5 , GPT-5.1 Model | OpenAI API , https://build.nvidia.com/nvidia/nemotron-3-super-120b-a12b/modelcard, NVIDIA Nemotron 3 Super 120B - Amazon Bedrock , Utilizing long context windows - Amazon Nova , https://assets.amazon.science/e5/e6/ccc5378c42dca467d1abe1628ec9/amazon-nova-premier-technical-report-and-model-card.pdf

Regarding values on “Typical Degradations Start” ranges, these are observed effective ranges based on research plus field behavior, not vendor-guaranteed limits, like Context Rot: Why LLMs Degrade as Context Grows (Complete Guide) | Morph and Context Rot: How Increasing Input Tokens Impacts LLM Performance | Chroma

Agent-specific failure modes

In a nutshell, when context grows, agents’ behavior becomes unreliable:

| Stage | Behavior |
| Early (0–30%) | In general, consistent outcomes |
| Mid (30–60%) | Some drift observed, plus missed dependencies |
| Late (60–90%)| Confusion and contradictory reasoning |
| Near limit | Random recall, hallucinations and loss of plan |

Where Pega’s architecture helps

Pega doesn’t solve this problem through agent design alone. It solves it through case design.
The case is the persistent context store, the state. Every agent, whether a Step Agent, an Application Agent, or an external agent called via A2A, operates on a shared case state. The data model is defined. The clipboard is structured. The case history and audit are immutable. What Agent 1 writes into the case is available to Agent 3 without explicit forwarding logic, because the case is the wire.

This matters for three reasons:

• Context can’t be silently lost across agent handoffs, because the authoritative state lives in the case, not in a conversation thread that each agent has to forward correctly.
• Auditability is structural, not bolted on. You don’t need to log context passing; the case history is the log. In the regulated market, this is a mandate.
• Humans can intervene at any point in the workflow without breaking continuity, because the state isn’t locked inside an agent chain.
Most pure-agentic frameworks treat context passing as a prompt engineering problem. Pega treats it as a data architecture problem. That is the right framing for regulated enterprise environments and organizations

The broader point

The industry conversation about agentic AI is still largely about capability. I see this daily: what agents can do. The harder, less glamorous conversation is about context integrity: what agents reliably know at each step of a multi-task, multi-agent workflow.
Pega’s case model is an architectural answer to that problem. It’s worth making that argument explicitly with clients, because most of them won’t arrive at it on their own. Context is not a side concern in agentic systems. It’s the system itself.
Curious whether others have hit this in delivery. What patterns have you found effective for managing context across complex agent chains?

Great article! really highlights that governance, token/cost control, reliability and compliance all come back to how well context is managed across the workflow. Critical from an enterprise perspective

Great article Fernando, and a framing I wish more teams would internalize before they hit production.

Your point about treating agent outputs as typed properties rather than free text is where I’d like to extend the conversation. Right now, most teams implementing multi-agent workflows in Pega are designing their own data models for agent context from scratch and, unsurprisingly, they end up with wildly different structures across projects.

I think there’s a natural next step here: a standardized, reusable Data Type at the application layer that captures agent context in a consistent way across all agents in a workflow. Something like a default AgentContext page structure that carries the input scope, structured output with typed outcomes, confidence or caveat flags, and handoff metadata including which agent produced it, timestamp, and case state at point of write. Teams would extend it for their domain, but the baseline contract would be consistent. This would make your “read boundary” pattern far easier to enforce and audit, because you’re not just disciplining what agents can see, you’re standardizing how they write back.

As Pega’s agentic capabilities mature, I’d also expect and hope to see a more formal orchestration construct emerge that treats this kind of context contract as a first-class design artifact, rather than leaving it entirely to each team’s case design discipline.

Curious if others have built something like this already, or whether we’re all reinventing it project by project.

Fernando, thank you for sharing these insights. it opened my eyes on the importance of the case in making agents effective, efficient and reliable.