Preventing "Lethal Trifecta" Security Risks in Agentic Systems

The “Lethal Trifecta”

AI agents are being deployed every day with fundamentally insecure configurations. We hear reports of security failures occasionally, but absent a framework connecting the dots between these failures it’s easy enough to dismiss each instance as a one-off security lapse rather than an expression of a structural weakness. Thankfully, Simon Willison articulated just such a clarifying framework nearly one year ago: the “Lethal Trifecta.”

Briefly, the Lethal Trifecta describes the critical risk to an agentic system’s data if it accepts untrusted content as input, has access to private data, and can communicate externally.

The goal of this piece is to operationalize the Lethal Trifecta with concrete examples of common business problems and appropriate solution architectures.

Lethal Business Problems

AI agents’ abilities to help us solve real-world business problems scale with the number of tools and datasets to which they have access. Unfortunately, for nondeterministic tools like AI agents, risk scales alongside capabilities. A selection of common use cases vulnerable to the Lethal Trifecta below:

1. “Triage Incoming Work”

Problem: AI is required to help triage and process a high volume of communications from external parties (e.g., complaints, support tickets, vendor proposals, pull requests).

Lethal Solution: An AI agent reads malicious external content and is empowered to take a broad range of actions through access to data and tools (e.g., made available via MCP, skills, etc.). The model does not distinguish the malicious from non-malicious content—all tokens are processed by the same underlying attention mechanism. Since there’s no separation between data and control flow, no fixed output schema or human-in-the-loop, the model can be hijacked into taking unwanted actions.

Example: The 2025 GitHub MCP Server incident in which malicious issues filed on public code repositories caused models to read the poisoned issues, access private repositories, and exfiltrate confidential code.

2. “Unify the Information Landscape”

Problem: Information is scattered across multiple channels, formats, locations, etc., and human users need to be able to query the information as if it were centralized.

Lethal Solution: A RAG engine indexes all in-scope data sources, some of which are exposed to untrusted content—e.g., a newly arrived malicious email. Even if the user doesn’t open an email, the RAG engine retrieves the email, causing it to be processed by an AI agent. The malicious instructions contain a prompt injection which causes the agent to leak sensitive data.

Example: The 2025 Microsoft 365 “EchoLeak” incident in which Copilot was tricked into embedding stolen data into URLs on trusted domains, which then auto-loaded via browser and sent data to the attacking servers.

3. “Act on My Behalf”

Problem: Human users want to truly delegate real-world tasks to AI—send communications, update data stores, navigate websites, etc.

Lethal Solution: OpenClaw and any number of similar-in-spirit solutions that have access to private data, are constantly ingesting external content, and can push messages to all platforms to which they are connected.

Example: A human user visits a webpage with malicious content that opens a connection to the local OpenClaw gateway. The gateway treats this connection as trusted, which facilitates a brute-force password attack, giving the attacker control of the Claw.


And the list goes on. Any time a system combines the three elements of the Lethal Trifecta, it is at risk.

Non-Lethal Agentic Patterns (if implemented securely)

To protect against the Lethal Trifecta and related security threats, LLMs and AI agents must first be governed by a principle of least privilege at the process level. Additionally, they must be subject to model-level controls within structured and governed workflows designed to isolate the “legs” of the Trifecta.

When untrusted content enters a secured, governed agentic workflow, it is not just delegated to an empowered AI agent (with potentially lethal consequences!); it can be managed safely by agentic patterns:

1. Deterministic, with Agents: The tried-and-true, recommended pattern. There is a canonical “happy path” by which most cases are processed, and “alternate paths” to accommodate deviations. Agents and LLMs are embedded with appropriate scopes within a deterministic, sequenced workflow.

2. Governed Agentic Orchestration: The case workflow itself is supervised by an agentic orchestrator, but the orchestrator’s first job upon being triggered via the ingestion of the untrusted content is to call one of a predefined set of deterministic tools or governed, hybrid agentic-deterministic workflows. The untrusted content can be parsed safely, and the appropriate next steps followed.

3. Classic Human-in-the-Loop: Agents read the untrusted content, access sensitive data, and draft external communications, but nothing is able to be communicated externally without explicit human approval.

The Strength of the “Deterministic, with Agents” Pattern

Because of the strong governance inherent to this pattern, agents can have a wider freedom-of-agency within their defined scope than is possible with other patterns. An agent-orchestrated workflow must place tighter constraints on wide-ranging agents to prevent them from initiating an inadvertent sequence of Lethal Trifecta actions. Ironically, the constraints of the determinism-based pattern allow us to safely unlock the nondeterministic potential of AI agents.

Having built lightly governed agentic workflows, I can confirm the futility of trying to force them to behave consistently through prompting and agentic frameworks alone. For anything but the most basic business problems, deterministic logic is required, and that logic needs to be called deterministically—you cannot rely on agents to consistently call the right deterministic logic without abusing the meaning of the word “agent.”

Final Thoughts

To date, the Lethal Trifecta has been successfully exploited with relatively primitive tools. If we learned anything from recent months’ “Claude Code Moment,” it is that AI coding agents have the capacity for step change capability gains. Unfortunately, these gains are equally useful for exploiting weaknesses in code and architecture. The best defense against the Lethal Trifecta in the agentic age remains business processes grounded in deterministic logic, leveraging the increasing power of agents in safe and predictable ways.

Human written. Opus 4.6 used to help research exploits and provide feedback re: clarity.

This is an interesting perspective, and definitely a major concern in regulated industries. The potential consequences for a “lethal” agentic deployment could be catastrophic to the organization given regulatory penalties. A major question: How do we help contextualize the design-time effort associated with creating these deterministic boundaries? Meaning, with all the hype around agents, many business leaders are expecting lightning-fast deployments and huge efficiency gains; when you approach with a message of “yes, but…” and then talk about using deterministic flows, how do you contextualize the value of the design-time effort to get these flows defined and implemented?

One way Pega is helping with this is through our work with design-time agents like Blueprint and Autopilot. Using AI-powered tools, trained in Pega best practices, Pega can help reduce the design-time burden necessary to define the various deterministic patterns of approved action run-time agents will participate in. By accelerating the design-time cycle, business and tech leaders can collaborate to identify the boundaries they want the agents to work within, logically separate the legs of the trifecta, build in controls, define core tenets of observability and reporting, define the user experience, etc and have a shared vision going into implementation. It also means that once implemented, those boundaries can be updated more quickly and effectively to respond to changing business needs, an essential element in the modern world that’s being rapidly evolved through AI.

Preventing the “Lethal Trifecta” is an absolute must, but many leaders are skeptical of the effort it takes to define the workspace. However, it’s definitely a worthwhile investment considering the alternatives.

Thanks for taking the time to lay this out so clearly based on the Article and the great examples provided .

For me the “Lethal Trifecta” maps very cleanly to why Pega is emphasizing the Predictable AI guidance: untrusted input, broad data access, and unconstrained action cannot safely coexist without explicit isolation and control.

Curious to hear from others: where have you seen these Trifecta emerge unintentionally in production designs, and which controls (workflow boundaries, tool scoping, human approval, observability) proved most effective in breaking it without killing value?