When Knowledge Becomes Control

I ran 265 experiments testing whether instructions change AI agent behavior. Most instructions don’t.

The instructions weren’t bad. Agents weren’t ignoring them. Instructions and governance are different things, and I’d been confusing them for months. Instructions live in attention. Governance lives in the system. Attention is the wrong substrate for durable control.

The test I arrived at: does the knowledge still act when the agent is too focused on its task to remember it? If not, it’s not governance. It’s context that might get read.

Here’s how I got there.

The Bypass Rate

Take rules that agents are supposed to follow — documentation updates after changes, size limits on files, verification steps before closing work — and check whether agents actually do them.

Across 265 controlled trials: behavioral rules have flat compliance regardless of emphasis, repetition, or positioning in context. You can bold them. Put them at the end. Repeat them three times. Compliance doesn’t meaningfully change.

The canonical example: I had a rule that files over a certain size should be split before adding more code. The rule existed. It was in the system prompt. Over a two-week measurement window, the bypass rate was 100%. Not one agent, across dozens of runs, changed their behavior because of that rule.

The rule existed. The rule did nothing.

What Works, What Doesn’t, and Where the Boundary Is

Not all instructions fail equally. The 265 trials broke into three categories:

Knowledge transfers. Routing tables, vocabulary, factual content the agent wouldn’t otherwise have — these produce a consistent lift. When you tell an agent “here’s how the system works,” it uses that information. This isn’t governance. It’s just giving the agent better inputs.

Stance transfers sometimes. Instructions that change where agents look — not what they do, but what they notice — work on specific problems. “Look for implicit assumptions between sources” took one scenario from 0% to 83% detection. But the mechanism is narrow: “look for X” works. “Do X” doesn’t. Stance primes attention. It doesn’t override behavior.

Compliance doesn’t transfer at scale. This is the big finding. Behavioral constraints — MUST, NEVER, ALWAYS — become inert past about five co-resident rules. At 10+, they’re noise. I tested this with a contrastive framework: same scenario, three conditions — bare agent with no document, agent with knowledge only, agent with full skill document including 87 behavioral constraints across 2,368 lines. On 5 of 7 test scenarios, the 87-constraint version performed identically to bare.

The boundary is precise: instructions that give the agent something (knowledge, attention direction) work. Instructions that ask the agent to override its task-focus (pause, verify, limit, check) fail under load. Advisory fails exactly where governance matters most — when the task is hard enough that the agent’s full attention is on the problem.

And it feels like all your rules work, because agents produce good output most of the time regardless. You write a rule, agents do reasonable things, you credit the rule. The agents were going to do reasonable things anyway. The rule is superstition.

Attention Is the Wrong Substrate

The three-way split — knowledge, stance, compliance — points at something structural. It’s not that agents are unreliable. It’s that attention is the wrong place to put behavioral control.

Knowledge works because it adds to what the agent can do. Stance works (narrowly) because it redirects attention the agent was already spending. Compliance fails because it asks the agent to subtract from its task-focus — to interrupt productive work and do a secondary thing. Under cognitive load, subtraction loses.

This means the entire category of behavioral constraint — the stuff that matters most for system integrity — cannot live in context. It has to live somewhere else.

The Harder Problem

But here’s what took me longer to see: even if you solved the compliance problem — even if every agent perfectly followed every instruction — you’d still need structural governance.

Because there are two failure modes, not one. Compliance failure is when agents don’t follow instructions. Coordination failure is when every agent follows every instruction and the system still degrades.

I watched this happen over twelve weeks. Thirty agents committed to the same file. Each commit was individually correct — a reasonable feature addition, well-tested, well-scoped. Nobody violated any rule. The file grew from 667 lines to 1,559 lines. Thirty locally correct contributions composed into structural degradation.

This is the accretion problem: when multiple amnesiac agents work on shared infrastructure, each making reasonable decisions without knowledge of the others, the system drifts toward entropy. Not because any agent failed. Because structural coordination isn’t an instruction — it’s a property of the system around the agents.

And it gets worse with better models. More capable agents make more locally correct contributions faster. Compliance failure shrinks. Coordination failure accelerates. The trajectories are opposite. Improving the agents makes the structural problem harder, not easier.

That’s why “write better prompts” is not a path to governance, even in principle. The problem isn’t that agents don’t listen. The problem is that listening isn’t enough.

Two Kinds of Rules

That realization forced a distinction: there are advisory constraints and structural constraints, and they’re fundamentally different things.

Advisory constraints live in context the agent reads. They depend on attention, which means they compete with the task. Under load, the task wins.

Structural constraints change what’s possible before the agent starts. The routing changes. The available actions narrow. The evaluation happens externally. The agent doesn’t comply with the rule — the rule already shaped the world the agent operates in.

AdvisoryStructural
”Don’t add code to files over 1500 lines”Route the task to a different workflow when the target file is over 1500 lines
”Verify your work before marking complete”An external check runs against the output; self-report is not accepted
”Include prior failure context”Prior failure data is injected into the task before the agent sees it
”Follow the project style guide”A pre-commit hook rejects non-conforming output

The distinction isn’t about strictness. It’s about substrate. Advisory constraints live inside attention. Structural constraints live in infrastructure.

When Knowledge Becomes Control

Every agent system accumulates knowledge. Memory files. Investigation notes. Decision records. Lessons learned. The pile grows. It feels like progress. But almost none of it changes behavior structurally. It’s just a growing collection of things agents might read.

Knowledge becomes control — becomes governance — at a specific transition point: when it enters a mechanism that shapes future work without relying on the agent’s attention, memory, or compliance.

The mechanism classes I’ve found that actually work:

Routing — Knowledge about past failures changes which agent handles the task, with what skill, at what priority. The agent doing the work never needs to know why it was selected. The knowledge acted before the agent started.

Spawn shaping — Knowledge about prior attempts is injected as structured context. Not “remember to check X” but the actual failure data, formatted so the agent starts from where the last one broke, not from zero.

Capability restriction — Knowledge about what goes wrong in certain contexts removes those options. The agent can’t pile code onto a bloated file because the task was already routed away from that file. The gate doesn’t persuade; it redirects.

Structural attractors — Architecture that makes the right path the default path. When I created a dedicated package for spawn-related code, the monolithic file shrank by 1,755 lines. No rule told agents to put code there — the package name primed placement. Architecture doing the work of instruction. Nobody had to read a rule. The structure existed, so code went there.

External verification — Knowledge about what “done” looks like feeds a check that runs outside the agent’s self-assessment. The agent says it’s finished. An external process tests whether that’s true.

In each case: the knowledge acts on the system, not on the agent. The agent doesn’t need to remember the knowledge, agree with it, or even know it exists.

The Compounding Effect

Here’s why this matters beyond “write better rules”: structural constraints can compound. Advisory constraints can’t.

When routing learns from a failure, every future task of that type gets handled differently — permanently. When spawn context carries prior-attempt data, the next agent starts from a higher baseline — automatically. When an architectural attractor redirects code placement, every future contribution follows the new path without anyone reading a rule.

Advisory rules can’t compound because there’s nothing to accumulate in. You can write a hundred more warnings in the system prompt. They don’t stack. They compete for the same limited attention.

But I want to be honest about something: the form of the structural constraint matters enormously. I’ve tested this. A crude gate that blocks code changes based on pattern-matching known failures is structural — but it’s destructive. It can’t tell the difference between “making the same mistake” and “touching the same code for a different reason.” In controlled experiments, agents working under blunt structural enforcement performed worse than agents with no governance at all.

Bad governance is worse than no governance. The structural constraints that compound are the ones that change the agent’s operating conditions (routing, architecture, verification) rather than the ones that try to police the agent’s output after the fact. The distinction isn’t just advisory vs structural. It’s whether the constraint works with the shape of the problem or against it.

The Ratchet

The only structure I’ve found that solves “how does a system improve over time without collapsing into self-reinforcing loops” is the scientific method: measure externally, record honestly, inject contextually, prevent backsliding into previously-falsified positions.

That sounds like a metaphor. It isn’t. Every structural governance mechanism I’ve described is literally one of those four operations:

Most agent systems have memory. Few have governance. The difference is whether what the system learned actually constrains what happens next — or just sits in a file that future agents might read.


I’ve been running this system in production for 18 months — 50+ agents per day on a shared codebase. The structural governance mechanisms above emerged from repeated failure: I measured what worked, discarded what didn’t, and built the infrastructure that actually changes behavior. The advisory stuff is still there. I just stopped counting on it.