Over the past few weeks I've been the most productive I've ever been at coding.
And I know — as a professional middle manager, you may be wondering whether that bar is especially high. What? Did you drool twice as much on the keyboard last week?
Fair. But here's the thing. Last week a customer told me one of my features was lame and wished it was better. I knew he was right. I just didn't know what to do about it, and it sounded like a lot of work. But hey — the customer is always right. So I started a conversation with Claude Code. Hopeful, but unsure.
A bit of iteration on the approach. A visit to my design bible to keep the change in shape. Some careful git navigation to cherry-pick to prod without disturbing the major release scheduled for the following week.
Thirty minutes later the upgrade was live. The customer loved it.
I didn't get better. The tools got better.
Other people have been telling these stories for months. I don't need to pile on. What I want to ask is why. What properties of programming made it the first domain to get this kind of lift — and what can other domains learn from it?
Here's the thing: humans have been talking to machines since the 1950s. And the entire history of software engineering has been about making that conversation tractable — at scale, across time, between a computer and a person, but also among groups of people who have never met and will never meet.
We didn't just invent a language for precisely telling a computer what to do. We built an entire stack around that language whose only real purpose is letting humans and machines reach a meeting of the minds about what we're actually building — and then letting the next human who shows up do the same thing, six months or six years later.
Source files. Version control. Pull requests. Test suites. Commit messages. Diffs. Merge conflicts as literal surfaced disagreement.
Every one of those is a tool for alignment. For decades, on millions of projects, humans have been externalizing the process of "what are we doing, why are we doing it, and is it actually right" into structured, inspectable, machine-readable artifacts.
And we did an absurd amount of it in public.
Every "oh god that bug was because I forgot to null-check" Slack thread. Every argument over Postgres vs. Mongo. Every half-finished PR that someone abandoned at 11pm and came back to the next morning with a better idea. Every "hotfix typo in comment" commit. Billions of them, on GitHub and Stack Overflow and mailing list archives, for twenty years.
When people talk about training data for code models, they picture the code. I think the code is almost a byproduct. What matters more is the ambient reasoning around the code — the arguments, the revisions, the meeting-of-the-minds process itself.
That's where the lift comes from. Not "write Python" but "participate in the decades-long conversation humans have been having about how to get computers to do the right thing."
So when a coding agent lands on a codebase, it's not "reading text and generating more text." It's landing in a dense, machine-readable structure — files, functions, types, tests, git history — designed for exactly the kind of collaboration it's now trying to do.
It has context it can inspect.
It has a workflow it can execute.
It has feedback signals — does the test pass, does the build succeed — that are precise, not vibes.
It has version control to roll back when it's wrong.
It has a social protocol ("here's a diff, here's why") that every programmer already knows how to receive.
None of that is model capability. All of it is layer. The layer was already there.
Coding agents didn't invent collaboration with AI on complex work. They inherited a seventy-year head start.
Now consider writing a novel.
A novel is also a complex artifact. Characters, arcs, themes, pacing, dependencies spanning hundreds of pages. A good novelist holds all of it in their head while they type the next sentence. Fiction is not easier than code. In some ways it's much harder — the constraints are subjective, the feedback loop is slow, the "tests" are human readers having emotional responses.
But the layer around the novel barely exists, at least not in any form a machine can touch. There's no syntax tree of a story. No version control for a character arc. No test suite that fails when the protagonist silently contradicts, in chapter 14, the core personality trait introduced in chapter 3. No public archive of the conversations working authors have with themselves about why the thing isn't working yet.
The novel itself is effectively a binary. You see the compiled artifact. You don't see the outline, the cut subplots, the three versions of act three, the deleted POV character, the editorial margin notes. They exist — in Scrivener docs and Google Doc comments and the margins of printed drafts — but not as a structured, durable, inspectable layer. Not in public. Not at scale.
So if you wanted a coding-agent-for-novels, you'd start with a decompile problem. Given just the binary, reconstruct the intent. Guess at every decision that produced it. Then try to make a coherent change in a system where the relationships between the parts exist only implicitly, in the head of the human you're ostensibly helping.
That's not a model problem. That's a layer problem.
Here's the claim I keep landing on: what separates code from other complex work isn't the text or the model. It's the ontology and the observability around it. Code had both almost by accident. Most domains have neither.
So here is what I think the next few years are actually about: not just bigger models, not just more agents, but a new generation of tools that build the missing layer for work that isn't code. Schemas for meaning, not just data. Revision history for ideas, not just text. Observability of decisions, not just outputs. The human/computer collaboration infrastructure, finally made explicit.
Some domains will build it. Some will try to skip it — yolo edits on the binary, the AI holding everything in its head, the user hoping for the best. That works for short artifacts. I don't think it scales.
So: what domains do you think are going to need a real layer to catch up with code? And what properties tip you off?