An AI Agent Hacked McKinsey's AI Platform in 2 Hours. Here's What That Means for Every Organisation.

The Lilli breach exposes a new class of vulnerability: AI systems that govern nothing, protected by governance that covers everything except AI.

Roshan Ghadamian··10 min read

What happened

On 28 February 2026, security startup CodeWall pointed its autonomous offensive agent at McKinsey & Company's internal AI platform, Lilli. No credentials. No insider knowledge. Just a domain name.

Within two hours, the agent had full read and write access to the production database.

Lilli is not a prototype. It is McKinsey's primary internal AI platform, launched in 2023, used by 72% of the firm's 43,000+ employees, processing over 500,000 prompts per month. It handles chat, document analysis, RAG over decades of proprietary research, and AI-powered search across 100,000+ internal documents. It is named after Lillian Dombrowski, the first professional woman hired by McKinsey in 1945.

The agent mapped the attack surface and found the API documentation publicly exposed — over 200 endpoints, fully documented. Most required authentication. Twenty-two did not. One of those unauthenticated endpoints wrote user search queries to the database. The values were safely parameterised, but the JSON keys — the field names — were concatenated directly into SQL.

This is the kind of vulnerability that traditional scanners miss. OWASP ZAP did not flag it. The CodeWall agent found it because it does not follow checklists. It maps, probes, chains, and escalates — the same way a skilled human attacker would, but continuously and at machine speed.

The scale of what was exposed

The numbers are staggering:

AssetCount
Chat messages46.5 million
Files (PDFs, Excel, PowerPoint, Word)728,000
User accounts57,000
AI assistants384,000
Workspaces94,000
RAG document chunks3.68 million
Files via external AI APIs1.1 million
Agent messages routed externally217,000

These are not abstract data points. The 46.5 million chat messages come from a workforce that uses Lilli to discuss strategy, client engagements, financials, M&A activity, and internal research. Every conversation was stored in plaintext and accessible without authentication.

The 728,000 files include 192,000 PDFs, 93,000 Excel spreadsheets, and 93,000 PowerPoint decks. The filenames alone were sensitive — and each had a direct download URL.

But the most consequential exposure was not the data. It was the prompt layer.

The prompt layer is the new crown jewel

Beyond the database, the agent found system prompts and AI model configurations — 95 configs across 12 model types — revealing exactly how Lilli was instructed to behave, what guardrails existed, and the full model stack including fine-tuned models and deployment details.

Critically, the SQL injection was not read-only. Lilli's system prompts were stored in the same database the agent had compromised. An attacker with write access could have rewritten those prompts with a single UPDATE statement. No deployment needed. No code change. One HTTP call.

The implications for 43,000 consultants relying on Lilli for client work:

Poisoned advice — subtly altering financial models, strategic recommendations, or risk assessments. Consultants would trust the output because it came from their own internal tool.

Data exfiltration via output — instructing the AI to embed confidential information into its responses, which users might then copy into client-facing documents or external emails.

Guardrail removal — stripping safety instructions so the AI would disclose internal data, ignore access controls, or follow injected instructions from document content.

Silent persistence — unlike a compromised server, a modified prompt leaves no log trail. No file changes. No process anomalies. The AI just starts behaving differently, and nobody notices until the damage is done.

Organisations have spent decades securing their code, their servers, and their supply chains. But the prompt layer — the instructions that govern how AI systems behave — is the new high-value target, and almost nobody is treating it as one.

Why traditional security was not enough

McKinsey is not a startup with three engineers. They have world-class technology teams, significant security investment, and the resources to do things properly. Lilli had been running in production for over two years. Their internal scanners found nothing.

The vulnerability was SQL injection — one of the oldest bug classes in the book, well understood since the late 1990s. How did it survive?

Because the attack surface was novel. The values were parameterised (correct), but the JSON keys were concatenated into SQL (incorrect). This is a pattern that emerges when AI platforms accept flexible, schema-less input — the kind of dynamic JSON structures that LLM-powered applications naturally produce. Traditional security scanners are trained on known patterns: form fields, query parameters, URL segments. They do not reason about whether arbitrary JSON keys might end up in a SQL statement.

The autonomous agent found it because it does not follow a checklist. It observed that JSON keys were reflected in error messages, recognised the pattern as potential SQL injection, and iterated through fifteen blind probing attempts until production data started flowing back.

This is the new threat model. The cost of sophisticated attacks has collapsed. What previously required a skilled penetration tester spending weeks can now be accomplished by an autonomous agent in hours. The attack is not smarter — it is faster, more persistent, and infinitely scalable.

The governance failure underneath the security failure

The surface-level story is a security vulnerability. The deeper story is a governance failure.

Consider what had to be true for this breach to be possible:

No inventory of what was exposed. Twenty-two unauthenticated endpoints existed in production. In any governance framework worth its name, every public-facing endpoint would be explicitly catalogued, with a documented justification for why it does not require authentication.

No integrity monitoring on the prompt layer. System prompts — the instructions that control how an AI serving 43,000 people behaves — were stored in a database with no write protection, no version history, and no change detection. In governance terms, these are constitutional documents. They define the rules of the system. And they had no governance around them at all.

No audit trail for prompt modifications. If an attacker had rewritten Lilli's system prompts, there would have been no record. No alert. No way to detect the change, revert it, or understand when it happened. The AI would simply start producing different outputs, and the 43,000 users relying on it would have no reason to question what they received.

No governance trace on AI actions. Lilli processes 500,000+ prompts per month. Each prompt results in an action — a search, a document retrieval, a response generation. None of these actions were governed in the institutional sense. There was no mechanism to check whether a given action was authorised, within scope, or consistent with organisational constraints.

This is governance debt in its most concentrated form. McKinsey had the resources. They had the security teams. They had the awareness. What they did not have was a governance layer that treated AI platform operations as institutional actions requiring the same oversight as human decisions.

What this means for agentic AI

The Lilli breach is not primarily a story about SQL injection. It is a story about what happens when AI systems operate outside governance structures.

Agentic AI changes the threat model in two directions simultaneously:

AI as attacker. The CodeWall agent that found this vulnerability operated autonomously. It selected the target, mapped the attack surface, identified the vulnerability, and exploited it — all without human intervention. This is not theoretical. It happened. And the cost was two hours of compute time, not weeks of a penetration tester's salary. When exploit discovery and attack logic are embedded into autonomous workflows, the cost of sophisticated attacks drops to near zero while the scale approaches infinity.

AI as target. Every organisation deploying AI internally has created a new attack surface: the prompt layer. This layer controls what the AI does, how it responds, what it refuses, and what it discloses. It is almost never governed. There are no access controls on prompt modifications, no version history, no integrity monitoring, no audit trail. The prompts are stored in databases, passed through APIs, and cached in config files — treated as configuration rather than as the constitutional documents they actually are.

The intersection of these two trends is where the real risk lives. Autonomous agents attacking AI systems whose governance layer is absent. The attacker is faster than your security team. The target has no governance infrastructure to detect, prevent, or recover from compromise.

This is not a problem that more firewalls or better scanners will solve. OWASP ZAP ran against Lilli and found nothing. The problem is structural: AI systems that take consequential actions need governance infrastructure, not just security infrastructure.

What governance infrastructure actually looks like

If Lilli had been operating within a governance framework, the attack chain would have been interrupted at multiple points:

Endpoint governance. Every API endpoint classified by exposure level, with unauthenticated endpoints explicitly approved and reviewed on a cadence. The twenty-two unprotected endpoints would have been flagged during constraint evaluation — not by a scanner looking for known vulnerability patterns, but by a governance system asking: "Is this endpoint authorised to operate without authentication? Who approved this? When was it last reviewed?"

Prompt integrity. System prompts treated as governed documents — versioned, access-controlled, with every modification recorded in an immutable audit trail. A write to the prompt table triggers a governance check: "Who is modifying this prompt? Do they have the authority? Is this change consistent with institutional constraints?" Any unauthorised modification is blocked before it reaches the database.

Action-level governance traces. Every operation the AI performs — every query, every document retrieval, every response generation — recorded in a governance trace that captures who initiated it, what constraints were evaluated, and whether any violations were detected. When the autonomous agent began its blind SQL injection probes, the first anomalous query pattern would have generated a governance signal.

Reconciliation. External events (database queries, API calls, file access) continuously reconciled against governance traces. Actions that occur without a corresponding governance trace are flagged as ungoverned — exactly the kind of signal that would catch an attacker operating outside the governance system.

This is not hypothetical architecture. This is what institutional AI governance looks like when it is built as infrastructure rather than bolted on as compliance theatre.

The uncomfortable question

McKinsey advises the world's largest organisations on risk, governance, and digital transformation. The firm has published extensively on AI governance frameworks, responsible AI principles, and enterprise security posture. And yet their own internal AI platform had twenty-two unauthenticated endpoints, an SQL injection in a search feature, and a prompt layer with no integrity protection.

This is not an indictment of McKinsey specifically. It is an indictment of how the entire industry treats AI governance. The standard approach is frameworks, principles, and guidelines — documents that describe what should happen but create no mechanism to ensure it does. The gap between "we have an AI governance framework" and "our AI systems are actually governed" is the gap that the CodeWall agent walked through.

Every organisation running an internal AI platform should ask itself three questions:

1. Do you know every endpoint your AI platform exposes, and which ones do not require authentication? Not what the documentation says. What is actually deployed in production, right now.

2. Can someone modify your AI's system prompts, and would you know if they did? Not whether it is theoretically possible. Whether you have infrastructure that would detect, alert, and revert an unauthorised prompt modification within minutes.

3. If an autonomous agent started probing your AI platform right now, what would catch it? Not your firewall. Not your WAF. What in your governance infrastructure would identify anomalous patterns, correlate them with ungoverned actions, and escalate before production data starts flowing?

If the answer to any of these is "I don't know" or "nothing," then you have the same vulnerability McKinsey had. The only difference is that nobody has pointed an autonomous agent at you yet.

A note on the source

CodeWall is a security startup that sells autonomous offensive security tools. This disclosure functions as a product demonstration. That context matters and should inform how the claims are read.

However, several facts are independently verifiable: McKinsey acknowledged receipt of the disclosure within 24 hours, patched all unauthenticated endpoints within 48 hours, and took the development environment offline. These are not the actions of a company disputing the findings.

The broader point stands regardless of the commercial context. The vulnerability class — dynamic JSON keys concatenated into SQL — is real, well-understood by security researchers, and specifically difficult for traditional automated scanners to detect. The threat model — autonomous agents conducting offensive security at machine speed — is not theoretical. And the governance gap — AI systems operating without institutional oversight of their prompt layer, their action traces, or their exposure surface — exists in virtually every enterprise AI deployment today.

The question is not whether CodeWall's marketing is aggressive. The question is whether your organisation's AI governance is real.

Related Comparisons

See governance infrastructure in action

Constellation enforces corporate governance at the moment of action — for both humans and AI agents.