How We Govern Our Own AI Agents (Dog-Food Case Study)

Constellation uses its own governance infrastructure to constrain AI coding agents in production

Roshan Ghadamian·March 7, 2026·7 min read

The Setup

Constellation is built using AI coding agents. These agents operate autonomously: they read files, write code, run commands, manage git operations, and interact with external services. They have broad access to the codebase and can take consequential actions.

This creates an obvious question: if Constellation is a governance platform for AI agents, shouldn't it govern its own AI agents? The answer is yes, and we have been running this setup since early 2026.

The mechanism is Constellation's MCP governance gate — a Model Context Protocol server that intercepts AI agent actions before they execute, checks them against institutional constraints, and either permits, blocks, or escalates them. The governance gate runs as an MCP server that AI coding agents connect to as part of their tool chain. Every action the agent takes passes through the gate.

This is not a demo or a proof of concept. It is our production development workflow. Every commit in the Constellation repository has been governed by Constellation.

What We Constrain

Our constraint set has evolved through practical experience. Here are the categories of constraints we enforce on our AI agents today.

Git operations. AI agents cannot push directly to the main branch. All changes must go through staging first. Force pushes are blocked entirely — there is no override. These constraints exist because a single bad push to main triggers a production deployment, and reversing a production deployment is significantly more costly than preventing a bad push.

Spending and resource commitment. Any action that commits financial resources — creating paid API subscriptions, provisioning cloud infrastructure, or purchasing services — requires human escalation above a defined threshold. The agent can suggest these actions and prepare the configuration, but it cannot execute them without explicit approval.

Schema and data changes. Database migrations require explicit human review before execution. The agent can generate migration files and even test them against a development database, but applying them to staging or production requires escalation. This constraint exists because schema changes are effectively irreversible at scale.

Documentation commitments. This is an unusual constraint that reflects our development philosophy. Every session that changes code must end with documentation updates, a passing build, a descriptive commit, and a push to staging. This is enforced as a governance constraint, not a cultural norm. The agent is reminded of this obligation and its completion is tracked.

External communications. AI agents cannot send messages to external services (Slack, email, webhooks) without human review. Internal logging is unrestricted, but anything that leaves the system boundary requires approval.

What We Have Learned

Six months of dog-fooding has produced insights that we could not have gained any other way.

Constraints must be precise, not aspirational. Early constraints like "be careful with production systems" were useless. The agents interpreted them contextually and sometimes incorrectly. Replacing them with specific, machine-readable rules — "cannot execute commands matching `git push origin main`" — eliminated ambiguity. The lesson: governance constraints are code, not prose.

The governance gate adds negligible latency. Our initial concern was that pre-execution constraint checking would slow down development. In practice, the gate evaluation takes 50-200ms per action — imperceptible in a development workflow where the human is reviewing output anyway. The performance cost of governance is not zero, but it is close enough to zero that it has never been a practical concern.

Escalations are information, not interruptions. We expected escalations to be annoying — the agent hits a constraint, work stops, a human must intervene. In practice, escalations are valuable signals. They tell us what the agent is trying to do, why it exceeds current authority, and whether the constraint should be adjusted. About 30% of our constraint refinements originated from analysing escalation patterns.

Shadow mode is essential for new constraints. When we add a new constraint, we run it in shadow mode first — the constraint is evaluated but not enforced, and the would-be violation is logged. This lets us calibrate the constraint before it affects the workflow. Without shadow mode, new constraints are either too loose (permitting what they should block) or too tight (blocking legitimate actions and frustrating the developer).

Governance Telemetry — Live product preview

2 passed 3 blocked 1 override

Q1: Who was responsible?

100%

Q2: What did they know?

100%

Q3: What did they do?

100%

Q1: Who was responsible?

T. Nguyen — Marketing Lead

Claude (Opus)

Governance Gate

Q2: What did they know?Evaluated against 4 constraintsDomain: communications

Q3: What did they do?Blocked — 1 constraint triggered8ms

Every trace answers the three questions a court reconstructs: who was responsible, what they knew, what they did.Standard: ASIC v Bekier [2026] FCA 196.

Friction Points

Dog-fooding is valuable precisely because it reveals friction that users would experience. Here are the friction points we have identified and how we have addressed them.

Constraint conflicts. Two constraints can conflict: "all code changes must be committed before session end" and "commits to the database migration directory require human review." If a session includes a migration, the agent must escalate to commit — but the session-end constraint requires a commit. We resolved this by implementing constraint priority ordering and allowing escalation to satisfy both constraints simultaneously.

Stale constraints. Constraints that were relevant during one project phase become irrelevant during another. A constraint blocking changes to a specific module during a refactoring freeze should be removed when the freeze ends. Without a review cadence, stale constraints accumulate and create unnecessary friction. We now attach review dates to all time-bound constraints and surface them in the weekly governance digest.

Context loss on escalation. When an agent escalates an action, the human reviewer needs context: what was the agent trying to do, why, and what constraint triggered the escalation? Early escalations included minimal context — essentially "action blocked, please approve." We improved this by requiring the governance gate to include the full action description, the triggering constraint, and the agent's assessment of why the action was appropriate. This reduced escalation resolution time significantly.

The "just let me do it" impulse. The strongest friction is psychological, not technical. When you are in flow and the governance gate blocks an action, the impulse is to override the constraint. We deliberately made overrides require a formal process — you must record why you are overriding, and the override is logged in the governance trace. This friction is intentional: it forces a moment of deliberation before bypassing governance.

Governance Pulse — Simulated day

Communications

Finance

Data & Privacy

Deployments

09:0012:0015:0017:00

0 events · 0 passed · 0 blocked · 0 escalated

Every event creates an immutable governance trace

Results

After six months of governing our own AI agents with Constellation, we can quantify the impact.

Zero unintended production deployments. Before the governance gate, we had three incidents where an AI agent pushed directly to main, triggering unintended production deployments. Since implementing the constraint, we have had zero. The constraint has been triggered (attempted pushes to main are logged) — it is not that the agent never tries, it is that the governance gate prevents the action from executing.

Governance trace coverage: 100%. Every action our AI agents take is recorded in a governance trace. When we need to understand why a change was made, the trace provides complete context — not just the git history, but the governance evaluation that preceded each action. This has been invaluable for debugging and for understanding decision history.

Constraint refinement cadence: weekly. We review and adjust constraints weekly based on escalation patterns and shadow mode data. The constraint set is a living document — not static policy, but evolving infrastructure that adapts to how we actually work. Roughly 15% of constraints are modified in any given month.

Developer satisfaction: high. This surprised us. We expected the governance overhead to feel like friction. Instead, the consistent experience is that governance constraints reduce cognitive load. Instead of remembering "do not push to main, do not modify the payment module without review, do not commit without updating docs," the governance gate handles it. You focus on the work; the infrastructure handles the governance.

The strongest validation is simple: we would not remove it. Even if Constellation were not a governance product, we would keep the governance gate running on our own development workflow. It makes us faster, not slower, because it eliminates the cognitive overhead of manual governance and the recovery cost of governance failures.

Dogfooding results

Decisions recordedEvery decision captured with rationale

100%

Reconstruction timeNear-zero time rediscovering past context

-80%

20%

Constraint violations caughtBlocked before execution

75%

Governance overheadLess time in meetings, more in execution

-60%

40%

Combined breakdown

What This Means for You

Dog-fooding is not just a product development technique. It is a credibility mechanism. When we tell customers that governance infrastructure can govern AI agents without destroying developer productivity, we are not speculating — we are reporting.

The specific constraints we enforce are less important than the pattern. Every organisation that deploys AI agents will have different constraints reflecting their institutional context. A financial services firm will constrain trading agents differently than a software company constrains coding agents. The constraints are specific; the infrastructure is general.

If you are running AI agents today — whether coding agents, custom LLM pipelines, or any other autonomous system — you can start with three steps. First, write down the constraints you currently enforce mentally ("do not push to main", "do not spend more than $500 without approval"). Second, ask yourself: if you were not watching, would those constraints still be enforced? Third, if the answer is no, you need governance infrastructure.

The gap between "constraints I enforce when I am paying attention" and "constraints that are enforced at the moment of action regardless of whether I am watching" is the governance gap. Closing it is not a luxury. It is a prerequisite for trusting AI agents with consequential work.