ENTERPRISE AI ARCHITECTURE · OPINION
Understanding Your Agentic Architecture
Fully autonomous agents and hybrid agentic workflows both have a place in the enterprise. The decision is which workloads belong in which - and what each architecture costs you at scale.
By Brian Feinberg · Enterprise Architecture · AI Cost Modeling · Workflow Orchestration
Everyone is building with agents. Few are modeling what agents actually cost when they run at enterprise scale, not the demo, not the pilot, but 350,000 cases a month across a regulated back-office workflow.
There’s a conversation happening in enterprise architecture right now that isn’t showing up in the vendor decks. It’s about what happens between the impressive demo and the quarterly infrastructure bill. It’s about the difference between a cost model that scales linearly with your business and one that quietly grows faster than your revenue.
The trigger for that conversation, more often than not, is a token bill that came in higher than expected. Let’s discuss why this happens structurally, and what you can do about it.
1 Two ways to build an agentic workflow
When you’re building an AI-powered workflow, a claims investigation, a complaint resolution, a fraud review, you have a fundamental architectural choice to make. It’s not about which model you use. It’s about who’s in charge of deciding what happens next.
In a fully autonomous, LLM-orchestrated architecture, a master agent — an LLM-driven control loop that owns the workflow — reads the conversation history and decides at each step what to do. Every action it takes, every tool call, every lookup, every downstream decision, goes through the model. The model is the orchestrator. It reasons its way through the workflow, step by step.
In a hybrid or platform-orchestrated architecture, deterministic rules handle the workflow routing, the sequencing, the conditionals, the escalation logic, the compliance gates, and the model is invoked only at the specific steps that genuinely require language understanding or judgment. It’s not the orchestrator; it’s a specialized capability called on demand.
Both approaches can produce excellent outcomes. The difference that matters at enterprise scale isn’t quality, it’s what each architecture does to your cost curve.
Figure 2. Two architectures for an agentic workflow. Left: a master agent re-invokes itself each step, accumulating context. Right: a deterministic engine routes between steps, calling the model only where judgment is required.
2 Autonomous agents reason over their full history
Here is the thing about autonomous agents that most architecture discussions underweight: every time the master agent takes a step, it re-reads everything that happened before that step.
This is how language models work, not a bug and there are mitigations, covered in section 4. But in practice, it means the cost of each step increases with every step that precedes it. The first step is cheap. The fortieth reads a context window that has been accumulating across the entire workflow: every tool response, every intermediate decision, every piece of data retrieved along the way.
THE MATH, SIMPLY PUT
Imagine each step adds 2,000 tokens of history. A 10-step workflow costs roughly 2K + 4K + 6K + … + 20K tokens just in orchestration context, about 110,000 tokens to orchestrate what might be 25,000 tokens of actual work.
A 40-step workflow under the same logic doesn’t cost 4× as much. It costs roughly 16× as much in orchestration overhead, because the context window sum grows as a triangular series, not a straight line.
This is the shape of the cost curve: quadratic in workflow length. Not linear. Not flat.
Figure 1. Per-case cost grows quadratically with workflow length; at enterprise volume, the per-case gap compounds into a structurally different monthly bill.
The platform-orchestrated alternative has no master agent re-reading history at every step. The deterministic engine decides what comes next, no tokens spent!
“The model isn’t just doing the work at each step. It’s re-reading every step that came before. At scale, that re-reading costs more than the work itself.”
3 Tool calls are an accelerant
The context window doesn’t just grow from the model’s own reasoning. It grows from what the model retrieves. Every tool call, every API call, every database lookup, every document fetch, returns a result, and that result enters the history verbatim.
Enterprise APIs are designed for completeness, not for LLM consumption. A policy record might return 80 fields. A CRM lookup might return 50. A transaction history query might return structured JSON with hundreds of rows. The agent uses 5 fields. The other 75 enter the context window and stay there, re-read at every subsequent step, forever.
This is where the cost acceleration becomes dramatic. In a complex enterprise workflow, tool calls happen at most steps. Each one deposits a payload into the context window that is orders of magnitude larger than the information the agent actually needed. By step 30, the model is re-reading the equivalent of a full case file, most of which it retrieved at steps 2, 5, and 11 and never looked at again.
ARCHITECTURAL RISK
Tool result bloat doesn’t just cost tokens. It degrades model performance. Models reasoning over extremely long, repetitive contexts produce lower-quality outputs than models reasoning over focused, relevant context. The cost and quality problems compound each other.
4 The gap you can close - and the gap you cannot
This is where honest architectural thinking becomes important. There are real and meaningful interventions available to teams running fully autonomous frameworks. Tool result pruning, provider-level prompt caching, model tiering for orchestration decisions, context resets at phase boundaries, each of these attacks a different component of the cost curve, and together they can reduce per-run costs by 50–65%.
The honest observation is this: the mitigations close the gap substantially, but to close it structurally, you end up building a deterministic orchestration layer around your agent calls anyway. You’re managing state. You’re enforcing sequences. You’re coding rules for what happens at each step. You’re separating “things the model decides” from “things the workflow decides.” And the mitigations themselves, caches, compression layers, pruning pipelines, are governed software in their own right: they have to be built, run, monitored, and brought inside the same audit and data-residency boundary that already governs the workflow.
“Every mitigation that meaningfully reduces autonomous agent cost is, in some form, an act of putting determinism back into the workflow. Which raises the question: where does that determinism live best?”
5 The architecture you build toward is already a platform
This is the uncomfortable truth that most teams encounter around month six of a serious agentic deployment. As you optimize, as you add caching layers, state machines, tool result processors, phase boundaries, deterministic routers, you are building, piece by piece, a workflow orchestration platform.
You’re building case management, because you need to track state across steps and resume where you left off. You’re building audit trail infrastructure, because regulated workflows require an examiner-ready record of every decision and its basis. You’re building escalation logic, because not every decision should go to the model and not every outcome should be automatic. You’re building governance controls, because the model will occasionally produce something that needs human review before it acts.
None of this is wrong. It’s the right thing to build. But it is worth naming clearly: the target architecture for a serious enterprise agentic deployment looks a great deal like a business process management platform with LLM capabilities embedded at the steps that need them. The cost model, the predictability, the auditability, the governance, these all flow from getting that structure right.
FOR CONTEXT
Gartner has recognized Pega as the leader in its Magic Quadrant for Intelligent Business Process Management Suites for many consecutive years, specifically for its ability to combine deterministic workflow orchestration with AI capabilities. The architectural argument for a hybrid model isn’t theoretical; it’s the direction the most mature enterprise deployments are already moving.
Pega’s “Agentic AI for Platform” model is one concrete expression of this architecture. The platform case price covers orchestration, case management, audit trail, governance, and GenAI capability in a single unit, because these aren’t separable concerns, they’re the same workflow, priced and managed together. The AI calls out to the model at designated steps; everything else runs deterministically. The cost is flat per case, regardless of workflow length.
This isn’t a sales point. It’s an architectural observation about where the value of a BPM platform comes from when AI is in the picture. The platform is doing the quadratically expensive work for free, not because it’s magic, but because deterministic code is not a language model, and deterministic code doesn’t need to accumulate a context window.
What to take from this
If you’re running or building fully autonomous agentic workflows at enterprise scale, a few things are worth pressure-testing:
1. Model your actual cost curve, not your per-step cost. The per-step number looks reasonable. The 40-step workflow number, once you account for context accumulation, tool result payloads, and orchestration output, may not. Build a model with your actual workflow length and tool call density before you commit to the architecture.
2. Be honest about what you’re building as you optimize. If you find yourself adding state management, deterministic routers, escalation rules, and audit logging to your autonomous framework, you are building a BPM platform. The question is whether you want to build it from scratch or start with one designed for exactly this purpose.
3. The predictability gap is as important as the cost gap. A flat, predictable per-case cost is not just cheaper in many scenarios, it is a fundamentally different planning input. Variable, volume-sensitive, model-price-sensitive token costs make budgeting, capacity planning, and executive reporting harder than they need to be.
The future of enterprise AI is not purely autonomous agents running unconstrained. It is agents embedded in workflows that know what they’re for, governed by systems that know what should happen next, and priced in a way that makes sense at the scale your business actually operates.
The teams getting there fastest are the ones who stopped treating architecture as a detail and started treating it as the decision.
What does your cost curve actually look like?
If you’re modeling agentic AI workflows and want to pressure-test your assumptions, on token costs, orchestration overhead, or architectural tradeoffs, I’m happy to compare notes. The math is more tractable than it looks once you build a proper model.


