All Bets Are Off
Outline
§1 — Zero marginal cost of production.
The cost of production is trending towards zero. Old measurements were bets; the odds moved. They’ve decoupled from what they tracked. So what now?
§2 — Coherence.
Therefore: Coherence is the new scarce resource. The value of externalizing it went up; the cost went down. Now we can write it all down.
§3 — Leverage.
But: Documentation is the foundation. What possibilities does that open up? Automating taste, the divergent-convergent loop, frequency vs. amplitude. The process is recursive: a ratchet.
§4 — “Too impractical.”
Therefore: Here’s what amplitude looks like in practice. Not faster — more thorough. The things that were never worth doing are now worth doing. These gains compound. Your competitors have the same tools.
§5 — Path dependence.
But: Most orgs will get this wrong. Resistance, co-option, thrash: three ways of refusing to place new bets.
§6 — New bets.
Therefore: The old bets were real, and now they’re off. Choose wisely.
Zero marginal cost of production (§1)
The naive read of AI coding tools is that we’ll take what we used to do in a month and now we’ll do it in two weeks, or eventually a week, and eventually a day. Maybe, but the implicit assumption here is typing speed was already your limiting factor.
But AI isn’t changing every aspect of a business in the same way and the same amount. Most of your business still runs at the speed of business. (Regulatory and compliance still run at the speed of government.)
What the numbers used to mean
A codebase with a million lines used to be worth something. Not because a million lines is inherently valuable, but because someone had to write them. Who would have spent all that time if the thing didn’t work? A million lines was evidence of a million decisions.
The million-line codebase is the most dramatic example, but the same thing happened everywhere:
- Lines of code used to mean effort.
- Coverage used to mean diligence.
- Velocity used to mean capacity.
- A 5K-line PR used to mean something had gone wrong.
None of these mean the opposite now. A codebase with high coverage might still reflect genuine diligence. A team with steady velocity might still be well-coordinated. But you can’t tell by looking at the number anymore. The number doesn’t confirm and it doesn’t deny. It just stopped being evidence.
If we haven’t already, we’ll see some series A “scam” acquisitions where a flashy startup gets acquired and once the purchaser does due diligence, they find that most of the repo is the scrawlings of a madman. A million lines, generated in weeks, signifying nothing. Worth less than nothing.
Why they all broke at the same time
Every one of those was an indirect measure — a bet that the thing you could measure would track the thing you couldn’t. Lines of code tracked effort. Coverage tracked diligence. Velocity tracked capacity. These were never the real thing. They were stand-ins.
They worked because you couldn’t hit the number without doing the work. Writing a thousand lines of coherent code required understanding the problem. Achieving 85% coverage required thinking about edge cases. Shipping consistently required genuine team coordination.
The indirect measures and the real things were linked by production cost. The cost was the authenticator.
When production cost drops, every indirect measure authenticated by that cost breaks at the same time. Not because anyone is gaming the system — in a healthy org, nobody is. But the numbers that used to require the underlying work no longer do. You can hit every metric on the dashboard and have done none of the thinking.
“LOC tells us something useful” was a bet. “Coverage means the code is solid” was a bet. The cost structure made those safe bets. The cost structure has changed.
Coherence (§2)
The goal was and remains: a high-quality product you can efficiently maintain and change over time.
So what correlates with that now?
Coherence.
By coherence I mean the structural property that makes the next change obvious. Not easy, necessarily — but obvious. You look at the existing patterns and you know where the new code goes, what it should be called, how it should behave.
That property exists at every level of the system. At the top it’s an architecture that maps cleanly to the business domain. In the middle it’s consistent patterns — one way to handle errors, one way to structure a service. At the bottom it’s naming conventions and file structure that don’t make you guess.
If code trends to zero marginal cost, then well-defined features start to trend to zero marginal cost as well. Coherence is what makes a feature well-defined. We’ll come back to why that matters.
Incoherence for humans
When humans do all the work, coherence lives in two places: the artifacts and the people. The code, the docs, the tests — and then everything the team just knows.
That second category is bigger than most teams realize. It’s not just “who understands the billing service.” It’s the shared scar tissue. “We tried event sourcing in payments and it was a nightmare, so we use simple CRUD everywhere now.” Nobody documented that as an architectural decision record. It’s just something the right people know, and they steer new work away from it instinctively. You don’t document flinches. The space of things you decided not to do is infinite.
Externalizing all of this — writing it down, keeping it current, making sure it reaches every engineer who needs it — was a real cost that competed with building. Teams made rational tradeoffs about how much to externalize.
Incoherence for machines
LLMs do not have 1:1s with your coworkers. LLMs do not even have memory. Their long-term memory is the artifacts.
So now two tradeoffs have fundamentally changed:
- the value from encoding coherent business thinking into the artifacts goes up.
- the cost of encoding coherent business thinking into the artifacts goes down.
Take the event sourcing example. When the humans wrote all the code, the three people who remembered the payments disaster would steer new work away from it. An LLM has no scar tissue. If nothing in the codebase or the docs says “we don’t do event sourcing in payments,” the LLM will cheerfully propose it — and generate a clean, well-structured, completely institutionally incorrect implementation. The value of having that decision written down went from “nice to have” to “the difference between useful output and output you throw away.”
And the cost of writing it down dropped. The same tool that can’t intuit the flinch can help you externalize it. Point the LLM at the payments service, tell it the history, and ask for an architecture decision record. Five minutes. The document that nobody was going to spend an afternoon writing now costs almost nothing to produce.
Now we can write it all down. It’s cheaper than it’s ever been, and it matters more than it ever has.
Leverage (§3)
Documentation is the foundation. What possibilities does that open up?
Automating taste
Once coherence is externalized, the next question is whether you can measure it. Nobody has a coherence score for a codebase today. But we’re close.
Static analysis was the first generation of automated judgment. It could measure what’s mechanically computable, like cyclomatic complexity. It was a blunt instrument, but it was an honest attempt to automate taste.
The next generation is already visible: an LLM running on every CI pipeline, assessing the fuzzier qualities that previously required a senior engineer’s eye. Does this PR introduce a new pattern where an existing one would do? Is the naming consistent with the rest of the module? How far has the actual code drifted from the documented architecture?
These assessments get scored per-PR, trended over time, and used as guardrails. The coherence score becomes correctness infrastructure.
Frequency vs Amplitude
In systems design there’s a pattern called divergent-convergent thinking.
- Divergent thinking is spitballing. “No bad ideas.”
- Convergent thinking is analysis and verification. “Doing the homework.”
The design process is framed as a repeating pattern of divergent → convergent → divergent → convergent.
/----------\ /----------\ /----------\
/ \ / \ / \
-* *--* *--* *-->
\ / \ / \ /
\----------/ \----------/ \----------/
Often visualized in a diamond shape of widening scope and then winnowing to the practical, then repeating.
LLMs are comically good spitballers. But they’re mediocre verifiers, almost by definition. Coherence and correctness infrastructure are investments in guiding the divergent phase and implementing the convergent phase.
One option is to try to leverage LLMs to run this process at a higher frequency. But doing the same thing you were doing before, but faster… that’s only so interesting, and kind of exhausting.
/\ /\ /\ /\ /\ /\ /\ /\
/ \/ \/ \/ \/ \/ \/ \/ \
── ──>
\ /\ /\ /\ /\ /\ /\ /\ /
\/ \/ \/ \/ \/ \/ \/ \/
The first thing we all do with a new tool is replicate what we were already doing. It’s not wrong, it’s the natural first step. But the task is to not confuse that for all the other unforeseeable new options that will open up.
But frequency isn’t the only dial.
Amplitude: Could we do drastically more in one cycle than we used to, because doing so in the old world would have been cost prohibitive or downright impossible?
/----\ /----\ /----\
/ \ / \ / \
/ \ / \ / \
/ \ / \ / \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
/ \ / \ / \
---* * * *--->
\ / \ / \ /
| | | | | |
| | | | | |
| | | | | |
| | | | | |
\ / \ / \ /
\ / \ / \ /
\ / \ / \ /
\----/ \----/ \----/
Wider divergent phase: more options explored per cycle, more approaches prototyped, more ideas tested against reality before committing to one. Wider convergent phase: more thorough verification, denser correctness infrastructure, the kind of rigor that was always valuable but never budgeted for.
And here’s the thing that makes this more than a one-time trick: the process is recursive. The machine helps you build the convergent infrastructure — the tests, the lint rules, the architecture docs — and that infrastructure constrains and improves the machine’s next round of divergent output. Better generation means better infrastructure gets built on top of it. The widening isn’t a single gesture. It componds.
Don’t do what you were doing before, but faster. Do the things that were always valuable but never justifiable under the old cost assumptions.
The engineers and teams that get the most out of this shift won’t be the ones shipping the same roadmap at higher velocity. They’ll be the ones who recognize that the entire set of things “worth doing” has expanded, and are systematically exploiting the new tradeoffs.
”Too impractical” (§4)
“That refactor isn’t worth it right now.” “Good enough for v1.” “Nobody’s going to write 200 test cases for that edge case.”
Every engineering team has a version of these sentences. Here’s what happens when they stop being true.
Not faster. More thorough.
I needed to build mock APIs with realistic backing data. In the old world, an engineer spends a day, writes maybe 2000 lines, covers the happy path and a few known edge cases. That’s a 90% solution, and everyone agrees it’s good enough, because going further means another day of tedious hand-written data and there’s other work to do.
With an LLM, the 90% solution takes an hour. But the other bottlenecks haven’t moved. Code review still takes the time it takes. Integration still takes the time it takes. Alignment with the team still takes the time it takes. So raw speed on the implementation isn’t the constraint worth optimizing.
The real move is to spend the same half-day you would have spent before, but instead of a 90% mock, you produce a mock with comprehensive test scenarios, realistic edge cases, failure modes, varied data shapes — the kind of thoroughness that nobody would have budgeted for previously. Not 90% faster. 500% more thorough in the same time envelope.
And that thoroughness isn’t just nice to have. That mock data becomes correctness infrastructure. Every feature built on top of it now has a richer, more realistic environment to be tested against.
Exploration collapses into proof
Exploration used to have to be budgeted and managed in steps. A proposal, a time allocation, a research spike, a partial implementation, a review — a multi-week process before anyone sees concrete results.
Instead, a proposal doc can have an attached reference PR with a likely full working implementation.
If the proposal gets refined based on feedback… regenerate the PR. The exploration and the proof collapse into one artifact.
This workflow has no analogue in the old world. It’s not a faster version of the old process. It’s a different process.
The old world separated “should we do this?” from “can we do this?” because answering the second question was expensive. When it’s cheap, you just answer both at once.
Let’s get weird
Those examples are conservative. Where could the “too impractical” calculus go from there?
Some things that wouldn’t have survived a planning conversation six months ago:
-
Self-healing CLAUDE.md. Use the LLM to write a CI job that uses the LLM to analyze a PR for divergences between the new code and existing CLAUDE.md files. When a PR changes a pattern that contradicts a CLAUDE.md, generate a proposed CLAUDE.md update and a proposed code revert. Let the reviewer pick: did we change the convention, or did we violate it? Forces the decision to be explicit either way.
-
Convention extraction from code review comments. Mine your team’s PR review history for recurring feedback patterns. “We always ask people to use the error wrapper.” “We always flag direct database access outside the repository layer.” Generate lint rules from the things humans keep repeating. Your reviewers have been writing a spec for years — it’s just trapped in GitHub comments.
-
Invariant mining. Point the LLM at your test suite and ask it to infer implicit invariants — things that are true across every test but never stated as a rule. Then generate lint rules or property tests that enforce them explicitly. The tests knew something the codebase didn’t say out loud.
-
Test generation from prod incidents. When a bug hits production, have the LLM write a regression test, but also have it scan for structurally similar code paths and generate speculative tests for those too. The incident becomes a pattern detector, not just a point fix. Every bug you find makes the next bug harder to ship.
-
PR-to-PR pattern drift. Track the patterns introduced across the last N merged PRs. Flag when the same problem is being solved three different ways across three PRs by three people (or three LLM sessions). Nobody sees drift in real time. An LLM reading across PRs can.
-
Architecture doc staleness detector. LLM reads the actual code, reads the architecture docs, flags divergence. “The docs say payments uses REST, but there are three gRPC endpoints now.” Reverse the usual flow — instead of updating docs from decisions, update docs from reality.
-
Mutation testing on steroids. Have the LLM generate semantically meaningful mutations — not random bit flips, but plausible mistakes an LLM might actually make. “What if someone used optimistic locking here instead of pessimistic?” If the test suite doesn’t catch it, that’s a real gap, not a synthetic one.
-
Dependency impact simulation. Before upgrading a dependency, have the LLM read the changelog and your usage of the library, then generate a set of “things that might break” as test cases. Run them before you upgrade. Turn the changelog into a pre-flight checklist.
These are all practically-wise, non-starters under the old cost structure. Not because they were bad ideas — because the implementation hours dwarfed the payoff.
But implementing any of these isn’t an independent win. Each one would produce infrastructure that the others consume. The system’s fabric gets stronger with every piece you add.
These gains compound. Each piece of infrastructure improves the next rounds of generation and verification, which means the next piece of infrastructure lands better too.
Our competitors have the same tools. The question is whether they’re investing in this coherence infrastructure or if they’re just trying to turn the crank faster.
Path dependence (§5)
The honest response
Everything in §1-§4 requires changing how you work. A natural response to that, when you’ve spent years getting good at the old way, is: no. That’s not irrational. It’s protective.
When React arrived in 2014, it violated every established best practice in frontend engineering. Some developers called it a fad. They were wrong, but their skepticism wasn’t stupid — it was calibrated to a world where those best practices had been genuinely correct. What changed wasn’t the quality of their judgment. What changed were the constraints their judgment was calibrated to.
That’s resistance. It says: this threatens something real, and I’m not ready to let go of it.
The worse failure mode
There’s another response, and it’s more dangerous. Call it co-option. This is where you technically adopt the new thing but use it to preserve every existing structure. Same org chart, same process, same estimation methods, same job descriptions — now with a subscription.
You can already see it. Jira integrations that auto-generate status updates. Sprint retrospective summarizers. AI-powered ticket estimation. Same work, same structure, same assumptions — now with a chatbot bolted on. And co-option is self-reinforcing: the tooling creates jobs, the jobs create advocates, the advocates entrench the tooling. This will happen at massive scale.
The worst failure mode
There’s a third response, and it’s the most destructive. Call it thrash. This is where someone fully embraces the new tool, points it at everything, and generates at full speed with no spec, no architecture, no convergence infrastructure — just output. PRs pile up. Code ships. Activity is visible on every dashboard. And the codebase gets worse on every merge, because volume without direction isn’t progress. It’s the politician’s syllogism applied to engineering: AI is transformative; I am using AI; therefore I am transforming.
Resistance preserves the old structure by refusing the new tool. Co-option preserves the old structure by absorbing the new tool. Thrash destroys the old structure and replaces it with nothing. All three end up in the same place: no coherence, no compounding, no infrastructure that makes the next cycle better. These are three ways of refusing to place new bets.
Banning the word calculator vs. using GPT to grade the same old assignment
An education parallel captures the first two failure modes cleanly. Banning GPT essays is resistance — honest, protective, ultimately a losing move because the word calculator isn’t going away. Using GPT to auto-grade the same five-paragraph essays is co-option. Technically it’s “adopting AI”, but preserving the exact measurement that stopped measuring what it was supposed to measure.
The hand written essay was an indirect measure of critical thinking. If the measure is dead, the right move is neither banning the tool nor automating the old measure. It’s raising the bar: teaching critical thinking using the tools that students will encounter in the world today.
The distinction
- Resistance is at least honest about the stakes.
- Co-option pretends the stakes don’t exist.
- Thrash pretends the work is the stakes.
Of the three, co-option and thrash are harder to fight, because both look like progress.
All three are cultural problems wearing technical clothes.
- Resistance is an identity problem.
- Co-option is a bureaucratic self-preservation problem.
- Thrash is a leadership problem.
The technical prescription — coherence and correctness infrastructure, compounding — is necessary but not sufficient. The organizational self-reflection required to actually adopt it is a different essay.
New bets (§6)
Everyone views the new thing in the lens of the old. It’s the only lens we have to start. The question is whether you’re going to get stuck there, or whether you can start to acquire new lenses.
The engineers who called React a fad weren’t wrong about their craft. The teachers banning AI essays aren’t wrong about critical thinking.
Every heuristic, every indirect measure, every definition of “worth doing” was a bet placed against a specific cost structure. Lines of code measured effort because effort was expensive. “Good enough for v1” was rational because thoroughness cost more than it saved. Estimation worked because implementation was the bottleneck. These were all good bets. They paid off for years.
But the cost structure is being rewritten, and now they might not be.
The indirect measures decoupled from what they measured. The set of things worth doing expanded past what the old calculus can see. And the most dangerous response isn’t refusing to adapt — it’s adopting the new tools to preserve the old assumptions.
All bets are off. New table. No limit. Choose wisely.
Appendix
Related thoughts
Craftsmanship
- The Woodwright’s Shop (1979-2017)
- The New Yankee Workshop (1989-2009)
We have all been writing code for 50 years like Roy Underhill. Those who decide to retain their artisanal path but adopt the new tools will go the way of Norm Abram. Those who choose to follow the path of scale will have to learn some patterns that may feel a lot like the advent of the modern factory. It’s bittersweet to see the twilight of the golden age.
Sea Change
- Margin Call (2011)
When a sea change comes, it is not obvious to most until the time for meaningful action has long passed.