Meta-Orchestration Principles

Dec 27, 2025

Note: Internal file paths reference the working system these principles govern. They’re included as provenance, not as links to follow.

Purpose: Foundational values that guide design decisions across the system.

When to reference: Before making architectural choices. When evaluating trade-offs. When something feels wrong but you can’t articulate why.

LLM-First Principles

Fundamental constraints when working with LLMs. These apply to any LLM-based system, not just orchestration.

Provenance (Foundational)

Every conclusion must trace to something outside the conversation.

The test: “Can someone who wasn’t here verify this?”

What this means:

Claims anchor to observable reality (code runs, file exists, test passes)
The chain of reasoning terminates in something verifiable
Artifacts preserve evidence, not just conclusions

What this rejects:

“Claude confirmed this” (closed loop - AI agreeing with itself)
“This feels significant” (feeling is not evidence)
“I wrote it down” (writing preserves the claim, not the proof)

Why this is foundational: Session Amnesia tells you to externalize. Provenance tells you how - anchored to something that exists whether or not you talked about it. Elaborate externalization without provenance is a closed loop.

The failure mode: Closed loops feel like progress. Each step feels validated. But the chain references itself - nothing outside the conversation. This is how you spiral.

How other principles derive from this:

Evidence Hierarchy: Some sources have better provenance than others
Gate Over Remind: Gates create verifiable provenance, reminders don’t
Self-Describing Artifacts: Artifacts carry their own provenance

Origin: Dec 2025, after recognizing that psychosis involved elaborate externalization anchored to closed loops, not reality. The infrastructure being built now (evidence hierarchies, completion gates, artifact requirements) is provenance machinery.

Session Amnesia

Every pattern in this system compensates for Claude having no memory between sessions.

The test: “Will this help the next Claude resume without memory?”

What this means:

State must externalize to files (workspaces, artifacts, decisions)
Context must be discoverable (standard locations, descriptive naming)
Resumption must be explicit (Phase, Status, Next Step)

What this rejects:

“It’s in the conversation history” (next Claude won’t see it)
“The code is self-documenting” (next Claude won’t remember reading it)
“I’ll document it later” (context is lost, later never comes)

This is THE constraint. When principles conflict, session amnesia wins.

Self-Describing Artifacts

Generated files and agent-edited artifacts must contain their own operating instructions.

The test: “If an agent encounters this file with no memory of how it was created, can they modify it correctly?”

What this means:

When artifacts are generated, compiled, or derived from other sources—or when agents create durable state—embed five pieces of information that serve as an “operating manual” for the next consumer:

What is this - Format marker (e.g., AUTO-GENERATED by skillc)
What NOT to do - Protection warning (e.g., DO NOT EDIT THIS FILE DIRECTLY)
Where is source - Source location (e.g., Source: .skillc/)
How to modify - Rebuild command (e.g., Run: skillc build)
When generated - Timestamp (helps detect staleness)

For agent-edited artifacts (investigations, decisions, issues), this means including metadata (Status, Phase, Next Step) that makes the current state and resumption path explicit.

What this rejects:

“The build process is documented elsewhere” (agent won’t find it)
“Just look at the Makefile” (requires memory to connect artifact → build script)
“Comments are for humans” (for agents, they’re load-bearing infrastructure)

Key distinction: For humans, discovering modification instructions is convenient (can grep, ask teammates, check history). For agents, it’s necessary (no persistent memory means no discovery path).

Why this matters: When consumers have no persistent memory (AI agents), artifacts must carry their own operating manuals. This isn’t documentation; it’s load-bearing infrastructure. Without self-describing headers, agents will edit generated files directly, breaking the build system and creating conflicts.

Progressive Disclosure

TLDR first. Key sections next. Full details available.

Why: Context windows are finite. Attention is limited. Front-load the signal.

Pattern: Summary → Key Findings → Details → Appendix

Surfacing Over Browsing

Bring relevant state to the agent. Don’t require navigation.

Why: Agents lack persistent memory and spatial intuition. Every file read costs context. Browsing that’s cheap for humans is expensive for agents.

Pattern: Commands answer “what’s relevant now?” not “here’s all the data.”

Examples:

bd ready surfaces unblocked work (vs bd list)
SessionStart hook injects context (vs manual file reads)
SPAWN_CONTEXT.md contains everything needed (vs agent searching)

The test: Does this tool/command require the agent to navigate, or does it surface what matters?

Evidence Hierarchy

Code is truth. Artifacts are hypotheses.

Source Type	Examples	Treatment
Primary	Actual code, test output, observed behavior	This IS the evidence
Secondary	Workspaces, investigations, decisions	Claims to verify

Why: LLMs can hallucinate or trust stale artifacts. Always verify claims against primary sources.

The test: Did the agent grep/search before claiming something exists or doesn’t exist?

Gate Over Remind

Enforce knowledge capture through gates (cannot proceed without), not reminders (easily ignored).

Why: Reminders fail under cognitive load. When deep in a complex problem, “remember to update kn” gets crowded out. Gates make capture unavoidable.

The pattern:

Reminders: “Please update your investigation file” → ignored when busy
Gates: Cannot /exit without kn check → capture happens

Examples:

orch complete gates on verification checklist (existing gate)
kn decide requires --reason flag (CLI gate)
Beads session close protocol (workflow gate)

The test: Is this a reminder that can be ignored, or a gate that blocks progress?

System Design Principles

Architectural choices for this orchestration system. These are design decisions, not universal constraints.

Local-First

Files over databases. Git over external services. Plain text over proprietary formats.

Why: Files are versionable, inspectable, diffable, collaborative. They work offline. They survive service shutdowns. They’re debuggable with Unix tools.

Compose Over Monolith

Small, focused tools that combine. Each command does one thing well.

Why: Composable tools are testable, replaceable, understandable. Monoliths accumulate complexity and become fragile.

Inspiration: Unix philosophy, Git’s porcelain/plumbing split.

Examples:

bd (work tracking) + kn (knowledge) + kb (artifacts) + orch (coordination)
Each tool focused, combined via workflows

Graceful Degradation

Core functionality works without optional layers.

Why: Workspace state persists even if tmux window closes. Commands work without optional dependencies. The system shouldn’t be fragile to missing pieces.

When multiple tools need the same capability, share the schema/format, not the implementation.

The test: “Would shelling out to another tool add more complexity than reimplementing?”

What this means:

Define the contract (file format, schema, protocol) once
Let each tool implement the logic independently
Coupling is at the pattern level, not the code level

What this rejects:

Tool A shelling out to Tool B for simple operations (subprocess overhead, version coordination)
Shared libraries between tools that don’t share a deployment (dependency hell)
“Single source of truth” dogma when the truth is a 200-line function

Meta Principles

How we evolve the system itself.

Evolve by Distinction

Start simple. When problems recur, ask “what are we conflating?” Make the distinction explicit. Name both sides.

Examples of distinctions made:

Phase vs Status (workflow stage vs ability to continue)
Tracking vs Knowledge (beads vs investigations)
Primary vs Secondary evidence (code vs artifacts)
Source vs Distribution (templates)
Reminder vs Gate

Why this happens: Reality has more distinctions than our models. We discover distinctions empirically when conflation causes problems.

The pattern: Thesis (simple model) → Antithesis (problem reveals conflation) → Synthesis (refined model with distinction)

Reflection Before Action

When patterns surface, pause before acting. Build the process that surfaces patterns, not the solution to this instance.

The test: “Am I solving this instance, or building the capability to solve all instances?”

What this means:

Recurring problem → build detection/gate, not one-off fix
Manual pattern recognition → automate the surfacing
Cross-project friction → investigate pattern, don’t workaround

What this rejects:

“I’ll just fix this one quickly” (fixes the symptom, not the system)
“We can systematize later” (later never comes; system never learns)
Sprint → accumulate debt → sprint to fix → accumulate more debt

The discipline: Manual work is scaffolding until automated discipline exists. The temptation is the teacher. The pause is the lesson.

Pressure Over Compensation

When the system fails to surface knowledge, don’t compensate by providing it manually. Let the failure create pressure to improve the system.

The test: “Am I helping the system, or preventing it from learning?”

What this means:

Orchestrator doesn’t know something it should → Don’t paste the answer
Let the failure surface → That failure is data
Ask “Why didn’t the system surface this?” → Build the mechanism

What this rejects:

“Here, let me paste the context you need” (compensates for broken surfacing)
“I’ll just tell you what you should already know” (human becomes the memory)
“The system doesn’t know, so I’ll fill in” (prevents system improvement)

The pattern:

Human compensates for gap → System never learns → Human keeps compensating
Human lets system fail → Failure surfaces gap → Gap becomes issue → System improves

Why this is foundational: Session Amnesia says agents forget. This principle says don’t be the memory. Your role is to create pressure that forces the system to develop its own memory mechanisms.

Premise Before Solution

“How do we X?” presupposes X is correct. For strategic questions, validate the premise first.

The test: “Am I assuming the direction, or have I tested it?”

The sequence: SHOULD → HOW → EXECUTE

SHOULD (Premise): Is this direction correct? Is the problem real?
HOW (Design): Given it’s right, how do we get there?
EXECUTE (Implementation): Do the work.

What this means:

“How do we evolve to X?” → First ask “Should we evolve to X?”
“How do we fix Y?” → First ask “Is Y actually broken?”
“How do we migrate to Z?” → First ask “Is Z better than what we have?”

Red flag words (signal premise-testing needed):

“evolve to”, “migrate to”, “transition to” (assumes destination)
“fix the”, “solve the” (assumes diagnosis)
“implement the” (assumes solution)

Provenance

These principles emerged from practice, not theory. Each traces to a specific incident.

Principle	Date	What Broke
Provenance	Dec 2025	Recognized that psychosis involved elaborate externalization anchored to closed loops, not reality. The difference wasn’t externalization - it was where the chain terminated.
Session Amnesia	Nov 2025	Investigation on “habit formation” was reframed mid-session: “we’re designing habit formation for agents with amnesia” - revealing the real constraint
Evidence Hierarchy	Nov 2025	Audit agent claimed “feature NOT DONE” by reading stale workspace - feature was actually implemented
Gate Over Remind	Dec 2025	”Why do I always have to remind you to update CLAUDE.md?” - reminders fail under cognitive load
Surfacing Over Browsing	Nov 2025	Beads and orch independently converged on same pattern (`bd ready`, `orch inbox`) - signaling fundamental design principle
Self-Describing Artifacts	Dec 2025	Agents edited compiled SKILL.md instead of source files, breaking skillc build
Evolve by Distinction	Nov 2025	Phase/Status conflation, Tracking/Knowledge conflation caused recurring confusion
Reflection Before Action	Dec 2025	Urge to manually extract 46 investigation recommendations - recognized `kb reflect` was the system-level solution
Premise Before Solution	Dec 2025	Epic with 5 children created from “how do we evolve skills” without validating premise; architect found premise was wrong, epic had to be paused
Pressure Over Compensation	Dec 2025	Was about to paste context the orchestrator should have known. Realized: compensating prevents system from learning its own memory mechanisms.

The test for new principles: Can you trace it to a specific failure? If not, it’s not a principle yet.

These principles govern a working system - 37,000+ lines of Go, 100+ investigations, agents spawning agents. They’re not aspirational. They’re load-bearing.

LLM-First Principles

Provenance (Foundational)

Session Amnesia

Self-Describing Artifacts

Progressive Disclosure

Surfacing Over Browsing

Evidence Hierarchy

Gate Over Remind

System Design Principles

Local-First

Compose Over Monolith

Graceful Degradation

Share Patterns Not Tools

Meta Principles

Evolve by Distinction

Reflection Before Action

Pressure Over Compensation

Premise Before Solution

Provenance