Pega has recently produced a whitepaper well worth your time.
Why Predictable AI Matters: Governing Decisions, Not Models, in the Age of the Token Economy makes an argument that is fundamentally correct for us and we need to start talking to customers with this model in mind. We need to think about governing decisions and not models.
Generative AI breaks enterprise software economics in a way that can’t be fixed by picking cheaper models or negotiating better contracts. Tokens don’t get cheaper with scale — they compound with complexity, context, and agentic reasoning loops. The paper argues convincingly that governance must live at the decision layer, before tokens are spent. Pega’s architecture already sits at that point. But it opens a door worth walking through more fully - The Intelligence Supply Chain.
Many organisations are treating AI as a monolith: one frontier model, one API, one bill that grows non-linearly as usage scales. A better frame is the intelligence supply chain.
Just as a well-run physical supply chain sources from multiple providers — routing each requirement to whoever delivers the best combination of cost, quality, and speed — an intelligence supply chain routes cognitive tasks to the right execution venue at each step. Composable, auditable, and governed. The question isn’t which AI to use. It’s which form of intelligence belongs at this point in the process, and where it should run.
That supply chain has more tiers than the current LLM conversation acknowledges.
Rules engines handle deterministic, low-variance decisions at near-zero cost. That’s the baseline — and for a surprising proportion of enterprise process steps, it’s entirely sufficient.
Statistical AI handles decisions that need to be evidence-based and that improve over time from real outcomes. This is where a lot of the most commercially consequential decisions actually belong, and it’s where Pega Customer Decision Hub remains a genuinely distinct competitive differentiator. A robust and coherent offer to a customer — the right product, at the right moment, for that individual — cannot be reliably formulated by a generative model. LLMs generate plausible outputs based on training data. That is a fundamentally different thing from a propensity model that has learned from millions of your actual customer interactions, constrained by eligibility rules, value optimisation, and real-time context. Next Best Action done properly requires all of that. CDH delivers it, and in a world where everyone is chasing generative AI, that capability is more valuable, not less.
Then there’s the local model tier, which I think is the most underappreciated part of this picture. We are now in a world where capable open-weight models can run on your own infrastructure. For bounded language tasks — intent classification, information extraction, summarisation, routing, response suggestion — these models perform the job well, often outperforming frontier models. More importantly, they carry essentially no per-token cost.
The economics flip entirely: a fixed infrastructure investment that pays-back across millions of interactions, rather than a continuous operational and unpredictable cost that scales with every customer conversation. Add data sovereignty — nothing transits a third-party API — and sub-second local latency, and for high-volume operational tasks, local inference isn’t a compromise. It’s frequently the right answer on every dimension simultaneously.
This is the core of breaking free from the frontier model cost trap. The Intelligence as a Service (IaaS) model for intelligence works well when you need it. But most of what enterprises do operationally doesn’t need it, and routing those workloads to local execution doesn’t just reduce cost — it removes a structural dependency entirely.
Frontier models absolutely earn their place, but for different kinds of work. Pega Blueprint is a good example: using generative AI creatively at design time to conceive workflow structures, case types, and decision architectures is exactly the right use of frontier model capability. It demands creativity, broad reasoning, and synthesis across complex requirements. Critically, it happens once — or at least infrequently — and the output is a deterministic, rules-based workflow that then runs at scale with no ongoing AI cost. The frontier model pays for itself many times over by generating the structure that makes subsequent decisions cheap and predictable. Similarly, drafting a bespoke communication for a genuinely exceptional case, or synthesising knowledge across complex unstructured documents, justifies the cost precisely because these are not continuous operational workloads.
The principle that emerges is simple enough: frontier models are investments in structure and creativity. Local models, statistical AI, and rules are the operational engine. Pega’s architecture, and its decision layer governance, is what holds the supply chain together — routing each task to the right tier, at the right cost, with the right level of accountability. We are not there yet for local models, but this must be an inevitability?
Competitive advantage in AI won’t belong to whoever has the most powerful model. It’ll belong to whoever builds the most intelligent supply chain around it. Pega is in an excellent position to do that.
I’d love to hear how others are thinking about this. Are you starting to architect AI as a pipeline of distinct tiers — statistical, rules-based, and generative? And are conversations about edge inference and data sovereignty entering your client discussions yet?