Blog
Inside OpenAI's Dark Factory: 1M Lines of Code, Zero Human Authorship
Ryan Lopopolo's team at OpenAI shipped 1 million lines of production code with zero human-written source. Here is what their Symphony system, ghost libraries, and post-merge review workflow actually look like.
The Static-First Prompt Architecture
Prompt caching is not a feature you toggle on. It is an architectural constraint. Here is the layered structure and 4-breakpoint strategy that makes it work reliably.
The Physics of Prompt Caching
Prompt caching cuts agent API costs by 90%. Here is how the KV cache actually works, what breaks it, and how to read the numbers.
OpenAI's Harness Engineering Explained
Harness Engineering is how OpenAI shipped 1 million lines of production code in 5 months with 7 engineers and no manually written source code.
Building a Long-Running AI Agent Harness
Every new Claude Code session starts with no memory of the last. Anthropic's engineering team built a two-part harness that fixes this. Here is what it does and how to implement it.
Evaluating AI Agent Skills with Skill Eval
You write CLAUDE.md files and hope the agent follows them. Minko Gechev's Skill Eval framework treats agent skills like code — with unit tests, scoring, and CI integration that catches regressions before they ship.
Why Agentic Apps Cost So Little to Build and So Much to Run
Building with AI agents has never been cheaper. Running them in production is another story. Here is the hidden cost paradox of agentic development and the context loop that solves it.