Does AI/Vibe Coding Really Deliver 10x Productivity?
In early 2026, Anthropic published a case study: 16 Claude agents, working in parallel Docker containers, wrote 100,000 lines of Rust code in a few weeks — a C compiler that could successfully compile the Linux kernel. The API bill came to roughly $20,000. By almost any measure, it was an extraordinary result.
Then I mentioned it to a friend of mine, a CTO at a small startup. His response: "The best strategy right now is probably to wait."
That tension — between a genuine technical milestone and a seasoned engineer's skepticism — is what this post is really about.
The Noise Around 10x
The AI productivity conversation has a familiar shape. On one side: breathless claims about 10x engineers, the death of junior developers, and Karpathy's "Software 3.0." On the other: practitioners who've tried the tools and found the reality more complicated than the demos suggest.
He cuts through both extremes well: when an entire industry's revenue depends on harvesting people's anxiety — you'll be left behind if you don't adopt this now, or you can get rich with this immediately — that's a warning sign. And he's right that most people claiming "AI works perfectly now" are overstating it.
But the macro trend is real. The people dismissing it entirely are also wrong. The truth is somewhere more interesting than either camp admits, and I've spent the past several months trying to find it — through a personal finance management system, this blog system, and a resume screening agent.
Why Vibe Coding Disappoints in Practice
"Vibe coding" — Karpathy's term — describes a mode of development where you describe what you want in natural language and let the AI figure out the rest. It sounds liberating. In demos, it looks effortless. Here's what the demos don't show:
Sessions drift. AI models have context windows, and long conversations accumulate noise. After an hour of back-and-forth, the model loses track of constraints established early in the session. The code it writes in hour three doesn't quite match the architecture it agreed to in hour one.
Without tests, AI solves the wrong problem efficiently. This is the killer. If your verification loop is loose — if you're eyeballing the output — the AI will confidently build exactly what you didn't ask for. The Anthropic compiler team spent enormous effort on high-quality test suites precisely because an autonomous AI without a tight feedback loop goes off the rails at speed.
Spec-driven development feels like losing your pairing partner. I tried breaking features into 40+ granular tasks with full specs. It reduced chaos and the AI stayed on track. But it stopped feeling like pair programming. I was waiting, not collaborating. The overhead of maintaining the spec system started eating the productivity gains.
Multi-agent, multi-window multiplies your context switching, not the AI's. I experimented with running multiple Claude instances in parallel. The bottleneck turned out to be me — my own attention was the scarce resource.
None of this means the tools don't work. It means they require a different discipline than the demos suggest.
What Actually Gets You to 10x
After months of daily use across multiple projects, these are the practices that actually move the needle.
1. CLAUDE.md is your foundation. A well-maintained project context file is the single highest-ROI investment you can make. It prevents the AI from repeating mistakes, re-learning your conventions, or contradicting decisions you've already made. Every time the AI does something wrong that it shouldn't have needed to be told, that goes in CLAUDE.md. The compounding effect is real.
2. Plan → Review → Execute, not vibe → fix → vibe.
The workflow that works: use planning mode to design the approach, review it before touching code, then execute. I've been combining planning-with-files and superpowers skills for this — persistent markdown plans the AI can read across sessions, with structured review checkpoints. The overhead feels like friction until you realize how much rework you're avoiding.
3. Replace vibe with tools wherever you can. Every time you ask an AI to "figure out" something that's actually deterministic — reading a config, running a query, executing a known command sequence — you're introducing unnecessary uncertainty. MCPs, scripts, and templates convert vague AI judgment calls into reliable tool invocations. The more you constrain the problem space, the more predictable the output.
4. Tests close the loop. Not because testing is virtuous in some abstract sense, but because without automated verification, you can't let the AI work autonomously. Tests are what transform AI from an assistant that needs constant supervision into something that can run unsupervised. Contract-driven development — define the interface, let AI generate tests from it, then implement to pass those tests — has worked well for me.
5. Do less, better.
AI makes it easy to generate a lot of things: documentation, boilerplate, edge case handlers, abstractions. Most of it makes the codebase worse. The discipline is subtraction: .claudeignore to keep the context clean, concise docs that stay synchronized, small iterations that stay reviewable. Complexity is cheap to create and expensive to maintain.
6. Agile thinking collapses into one person. The AI era has effectively collapsed the design/dev/test/deploy cycle for solo projects. What used to be sequential ceremonies — sprint planning, standup, review, retrospective — can now happen within a single focused session. Small iterations, tests first, immediate feedback. This is where the gains are most tangible and most reliable.
7. Build and maintain a knowledge base. Architecture decision records, design best practices, conventions your team has settled on — these matter enormously. When the AI has access to well-structured reference material, it stops guessing and starts reasoning from your actual context. A good knowledge base is the difference between an AI that understands your system and one that hallucinates a plausible-looking alternative.
8. Give it an example to copy from. This one is underrated. When I provide a concrete reference — an existing component, a similar API, a prior implementation — the output quality jumps dramatically. AI is exceptionally good at pattern-matching and adaptation. Without an example, it invents its own patterns. With one, it extends yours. The difference in speed and quality is easily 10x.
The honest number? For the right tasks — greenfield features, focused refactors, documentation, test generation — I see something close to 3–5x on a good day. Not every day, not for everything. But consistently, for the tasks where it works, it's genuinely transformative.
From Solo to Team Scale
Here's where it gets complicated.
Individual productivity gains are real. But organizations are not simply collections of individuals, and the math stops being straightforward when you try to reason about what AI means at team or company scale.
If one engineer with AI can do the work of several, the immediate instinct is: great, we need fewer people, or we can do more with the same team. Both are probably partially true. But the dependencies between teams don't shrink just because individuals move faster. An IC who can build ten things still needs product decisions, design reviews, security sign-off, and infrastructure support. Individual speed can surface coordination costs more visibly than it resolves them.
A few specific challenges I see at scale:
AI over-engineers collaboration surfaces. When AI generates perfect documentation, perfectly structured code, and exhaustive test coverage, it paradoxically becomes harder for colleagues to contribute. The bar for a pull request quietly rises. Disagreements become harder to resolve because the AI-generated baseline looks authoritative. Sometimes a messy codebase is easier to change than a pristine one.
Production risk doesn't disappear. An incident in production can cause irreversible damage — data loss, customer impact, compliance violations. AI cannot and should not have autonomous authority over production systems. The human-in-the-loop requirement isn't a limitation to work around; it's the right design. AI plans, humans approve and execute.
Skill atrophy is a real risk. If you rely on AI for every technical decision, what happens when the AI is wrong and you can't tell? The engineers who get the most from AI are the ones who understand what it's doing well enough to catch its mistakes. That requires maintaining real technical depth — which sometimes means doing things the hard way.
Management expectations can be the most dangerous failure mode. "If AI gives us 10x productivity, we should be doing 10x the work" — applied uncritically, this logic leads to burnout, quality collapse, and the loss of conditions that make AI-assisted development work in the first place.
What helps at scale: shared MCP libraries that standardize how AI interacts with your systems, high cohesion and low coupling at the individual level so AI-assisted work has fewer dangerous blast radii, and honest conversations about what AI can and can't be trusted to do unsupervised.
The Honest Answer
AI/Vibe Coding is developing fast and the productivity gains are real. I've experienced them firsthand. But it's an easy-to-start, hard-to-master discipline. The ceiling is high. So is the learning curve.
His instinct to wait isn't wrong, exactly. The hype warrants skepticism, and the tools are still rough around the edges. But they aren't standing still. The gap between an engineer who has figured out how to work well with AI and one who hasn't is widening every month.
The limiting factor isn't the tools anymore. The tools are good enough. The limiting factor is the engineer — their discipline, their taste, their willingness to invest in the feedback loops and context structures that let AI work well.
The question isn't whether AI can deliver 10x. In the right conditions, it can, and then some. The question is whether you've built the conditions.
