The Dark Factory: 1M Lines, Zero Human Authorship

The Old Model

Humans write. Humans review. Humans merge.

Code generation is the bottleneck. Every PR needs human approval. Agents are "assistants" that propose diffs for humans to accept or reject.

Sequential Review-Gated

What Is A Dark Factory

Agents write, review, and merge. Humans run capability analysis.

Ryan Lopopolo's team at OpenAI Frontier ran a five-month experiment. Three engineers, one empty repo, one rule: nobody writes code by hand. They built an Elixir orchestrator called Symphony and shipped 1M+ lines of production code across 1,500+ PRs, all autonomously merged. Humans did not review PRs. They watched patterns and fixed the harness.

Symphony Elixir Orchestrator Post-Merge Review

Mental Model

A factory with the lights off

Manufacturing's "dark factory" runs without humans on the floor. Machines do the work. Humans design the process. The software version: agents own the keyboard, humans own the specification.

Specs > Source

60-Second Build

Build speed is clock speed

Rule: builds complete in under 60 seconds, always. Every idle minute across 20 parallel agents is a minute of burned tokens.

Make→ Bazel→ Turbo→ NX

500+ NPM packages

Post-Merge Review

No human PR gate

Code merges if CI passes. Humans sample output after the fact, looking for patterns, not individual defects. Agent-reviewing-agent runs P0/P1/P2 triage.

P0 Block P1 Flag P2 Inform

Economics

Token billionaire math

~1 billion tokens/day consumed. Daily spend is real, but still cheaper than the team it would otherwise take.

$2-3K / day $60-90K / mo 3-7 engineers

PR Velocity

The model upgrade compounds

Humans did not work faster. The models did. Each generation turned previously hard tasks into routine ones.

Pre-5.2: 3.5 PRs/eng/day Post-5.2: 5-10 PRs/eng/day

Symphony Stack · Elixir Orchestrator

Six layers that turn 20 parallel agents into a factory

L6ObservabilityPrometheus, Jaeger, Grafana. Traces, logs, metrics, dashboards.

L5IntegrationGitHub PRs, Linear issues, Slack, observability APIs.

L4ExecutionTask runners, skills, CLI invocations. Where agents actually work.

L3CoordinationElixir process supervision. Lifecycle, restart, isolation.

L2ConfigurationEnvironment setup, tool exposure, blast-radius control.

L1PolicyHard guardrails. CI must pass. Security rules are non-negotiable.

The Role Shift

From code review to capability analysis

Before	After
Review every PR line by line	Sample outputs for patterns
"Is this code correct?"	"Why did the agent fail here?"
Gate merges sequentially	Merge on green CI, observe
Fix bugs in the PR	Fix capability gaps in the harness
Bottleneck: human attention	Bottleneck: specification quality

The new question: not "did the agent write the right code" but "does the harness give the agent everything it needs to write the right code."

Ghost Libraries

Distribute specs, not source code

Ship a specification. The agent reads it and reproduces the library locally, tailored to your codebase. No version conflicts. No shared source. No supply chain attacks.

Speculative Not hypothetical

Still Human Territory

What agents cannot do yet

Zero-to-one product ideation
Cross-cutting architectural refactors
Direction-setting outside the spec

Agents follow patterns. They do not invent them.

The Core Insight

The only fundamentally scarce thing is synchronous human attention. Models are trivially parallelizable.

— Ryan Lopopolo, OpenAI Frontier Product Exploration

Translation: if you can run 20 agents in parallel and each produces working code, the bottleneck is not authorship. It is the quality of the specifications and constraints they receive. That is harness engineering.

What To Adopt Today · Any Team Size

Five dark-factory practices that work at solo-dev scale too

Practice	Why It Matters	Smallest Version
1. Measure the build loop	Build time is the clock speed of every agent you run.	Fix anything over 2 minutes.
2. Encode taste as text	Agents consume CLAUDE.md, specs, and quality scores as context.	Turn tribal knowledge into markdown.
3. Consider post-merge for low-risk	Not every PR needs a human gate. Observability replaces review.	Auto-merge on green CI for docs and tests.
4. Treat code as disposable	If the spec is good, regenerating is cheaper than defending.	Throw away, do not merge-conflict-resolve.
5. Invest in agent observability	You cannot fix capability gaps you cannot see.	Log every agent action with structured traces.