Building Blind

Mar 8, 2026

I’m building orch-go. It’s a multi-agent orchestration system. It’s in Go. I can’t read Go.

Not “I’m rusty in Go.” I’ve never written Go. I studied biochemistry, learned to code by gluing Rails apps together for sandwich factory data collection, and spent 12-13 years doing pragmatic, StackOverflow-driven development. JavaScript and Ruby I can read. Go I cannot.

In two months, the project has accumulated 1,933 commits. The status calculation alone is ~1,400 lines of Go. I have no idea what most of it says. But the system works — spawns agents, tracks state, enforces governance, runs a dashboard. I built it by telling AI agents what to build, watching what happened, and fixing what broke.

That’s the setup. Here’s what I learned about understanding systems you can’t read.

The Problem

When I was learning web development, Ruby on Rails models were essential. Open app/models/user.rb, see has_many :posts, belongs_to :organization. That code structure helped me understand how the system worked. Models weren’t just code — they were how I formed mental models.

Building orch-go has been different. Agent state is distributed across four independent systems. I can’t read the implementation to understand how it works.

So I navigated by spawning investigations. Agent shows unexpected behavior? Spawn an agent to observe what happens. Dashboard displays wrong state? Spawn investigation to trace the data flow. Each investigation produces a document — behavioral observations, what I tried, what I learned.

By January, I had 150+ investigations. But observations without synthesis don’t form understanding. Each new problem felt fresh even when I’d investigated similar issues before. I was building production infrastructure while blind to how it worked.

The Hypothesis

I realized I needed something like Rails models — but built from behavioral observations instead of code.

On January 12, I synthesized five models from those 150+ investigations. Each model took 20-40 investigations and structured them into core mechanism, failure modes, constraints, evolution history.

The agent lifecycle model is a good example. Over 18 days, I’d spawned 17 investigations observing agent state:

“Dashboard shows active, but beads says closed”
“Agent marked dead, but work is actually done”
“Session went idle, but no Phase: Complete comment”

The model revealed what I couldn’t see by reading code: agent state exists across four layers with different authority levels.

Layer	Storage	Authority
Beads comments	`.beads/issues.jsonl`	Highest (canonical)
OpenCode on-disk	`~/.local/share/opencode/`	Medium (historical)
OpenCode in-memory	Server process	Medium (operational)
Tmux windows	Runtime	Low (UI only)

Now I could query the model instead of investigating from scratch: “Why does dashboard show active when beads says closed?” Answer: caching lag, refresh browser. “Can I trust session idle = complete?” Answer: no, sessions go idle for many reasons.

Same trick as Rails, just built from observations instead of code. I thought I’d cracked it.

What Actually Happened

I was wrong. Or at least, I was incomplete.

I wrote the first draft of this post on January 12 and said I’d test the hypothesis for two weeks. Two months later, the evidence tells a different story than I expected.

Those original five models? They sat largely unused. Zero probes. Two cross-references across the entire knowledge base. Investigation activity flatlined — not because models answered my questions, but because I’d moved on to building new things faster than I could query old models.

I could read them and understand what had happened. But when new problems came up, I didn’t reach for the models. I spawned new investigations. Same as before.

What Actually Worked

Something else emerged. Starting in late February, a new generation of models appeared that worked differently from the January batch.

The behavioral-grammars model is the clearest example. Unlike my original models, it:

Has 9 probes attached — experiments that test specific claims
Explicitly labels unvalidated claims as “directional hypothesis pending re-measurement”
Has 17+ cross-references — other artifacts actually cite it
Gets updated when probes contradict it

My January models said “here’s how the system works.” The February models say “here’s what we think happens, here’s how we’re testing it, here’s where we’re wrong.”

January models	February models
Synthesized from past investigations	Emerged from observed failures
Treated as finished documents	Treated as living hypotheses
Zero probes attached	9 probes testing claims
2 cross-references	17+ cross-references
Sat unused	Actively queried and updated

The January models were too confident. They described mechanisms without infrastructure to verify the descriptions. The February models work because they’re honest about what they don’t know.

The Rails Parallel, Revised

In Rails:

class User < ActiveRecord::Base
  has_many :posts
  belongs_to :organization
end

Read this, understand: “A user has many posts and belongs to one organization.” The code is the truth. You can trust it.

I tried to replicate this with behavioral models — write down how the system works, trust the document. But behavioral models aren’t code. Code is self-verifying; it runs or it doesn’t. A model that says “completion authority is beads, not sessions” might be wrong, and nothing catches the error until you hit it in production.

What actually works is closer to how a scientist uses models: state a hypothesis, design an experiment, update the model when reality disagrees.

What Transferred

From 12-13 years of pragmatic software work, I brought:

Models structure understanding (from Rails)
When multiple things claim to be the source of truth, you need to know which one wins (from database work)
When state lives in multiple places, it will disagree (from distributed systems)
Build, fail, extract principles (from shipping things that break)

None of that is Ruby-specific or Go-specific. It’s systems knowledge — how to think about state, authority, verification, and modeling.

That knowledge transferred to orchestrating Go. Not Go syntax. Not implementation patterns. But the habit of thinking: when you can’t hold complexity in your head, build structures to hold it for you. And when those structures stop working, notice that they stopped working rather than trusting them on faith.

What This Means

The conventional worry is: “What happens when AI writes all the code? Don’t you need to understand the implementation?”

I’ve been testing this for months.

You don’t need to know Go to build in Go. I’m evidence of that — 1,933 commits, can’t read any of them. What you need is the stuff I picked up in 12 years of shipping things that break: how to decompose problems, how to verify behavior, how to notice when your mental model is wrong.

But you can’t just write down how the system works and call it understanding. My January models proved that. I had the explanations. I didn’t use them.

What actually works is a loop: build something, observe how it fails, form a hypothesis about why, test the hypothesis, update your model. The models are essential — but as hypotheses to be tested, not conclusions to be trusted.

Knowing Ruby doesn’t help you read Go. But the habit of testing claims against reality — that transfers to anything.

I’m still building blind. But I’m getting better at it. Not because my models got more complete, but because my process for discovering where they’re wrong got faster.