[ Blog ]

AI software without failures: why the spec matters

Vibe coding is tempting for a quick demo, but in production it breeds cost and risk. Here's why AI software should start with a specification and architecture - and how much it saves across the whole lifecycle.

Jakub Chmielewski · AI Adoption Partner Published June 28, 2026 11 min read

Custom software
AI in practice

In short: vibe coding - building software with AI without reading or understanding the code it produces - is great for a prototype or to test an idea. In production it turns into cost: unstable code, security holes, and fixes that break the next thing. Spec-driven development flips the order. First you settle the needs, features, documentation and architecture - the specification. Only then does AI build the code to that plan. It’s the same proven engineering process as always, with AI compressing its slowest stages. The result: fewer fixes, lower maintenance cost, and software you can grow - not just demo.

What is “vibe coding”, and why is everyone suddenly talking about it?

It’s building software by telling AI what you want and accepting the code it returns - without getting into how it works. The term was coined in February 2025 by Andrej Karpathy, a co-creator of the technology behind ChatGPT: “a new kind of coding […] where you fully give in to the vibes […] and forget that the code even exists”. The idea caught on so widely that “vibe coding” became Collins Dictionary’s Word of the Year 2025.

The appeal is obvious: in an hour you have a working demo without writing a line yourself. But Karpathy added a caveat in the same breath - it’s fine “for throwaway weekend projects”. The term was born with a note about its scope. The trouble starts when that note disappears and “vibes” end up in software your business runs on.

Where vibe coding works, and where it starts to cost

It works wherever the code is disposable by design or cheap to throw away:

A prototype - you want to see an idea on screen before deciding whether to build it.
A version to validate - you show the concept to customers to gauge interest.
A throwaway internal tool - a script that computes something once.
Learning - the fastest way to see what’s possible today.

A quick prototype is the right tool in the right place - you reach for it when an idea needs validating, before you invest in building. The line was put best by engineer Simon Willison: “If an LLM wrote the code for you, and you then reviewed it, tested it thoroughly and made sure you could explain how it works to someone else - that’s not vibe coding, it’s software development”. The difference isn’t that you use AI. It’s whether someone understands and controls what was produced. For a prototype, they needn’t. For software that handles your customers and your data, they must.

What breaks when “vibes” hit production?

The last 30% breaks. Engineer Addy Osmani calls it the “70% problem”: AI tools deliver 70% of the result surprisingly fast, but “that final 30% becomes an exercise in diminishing returns”. That 30% is exactly what decides whether software is production-ready: edge cases, bugs, security, integration with the rest of the company. Osmani calls the result “house of cards code” - “it looks complete but collapses under real-world pressure”.

The hardest evidence is about security. In Veracode’s study across more than 100 models, 45% of generated code failed security tests and introduced vulnerabilities from the OWASP Top 10 - and the result didn’t improve with model power. The effects show up in real apps: an analysis of more than 5,600 applications built on vibe-coding platforms found over 2,000 vulnerabilities, more than 400 exposed secrets and credentials, and 175 cases of leaked personal data.

Even developers don’t trust it. In the Stack Overflow 2025 survey, 84% use AI, yet only 3.1% “highly trust” the accuracy of its output, and 66% are frustrated by answers that are “almost right, but not quite”. “Almost right” is precisely the code someone eventually has to understand and fix - usually at more cost than if it had been built right the first time.

	Vibe coding	Spec-driven development
When	prototype, testing an idea	software your business runs on
Starting point	”let’s see what comes out”	needs, features, architecture
What you get	a quick demo	a product you can grow
Maintenance cost	rises with every change	predictable
Who understands the code	often no one	your team and AI share one plan

Why is architecture the thing vibe coding can’t see?

Because vibe coding looks at one screen at a time, and architecture is the plan for the whole. Software architecture describes how the parts connect and how the whole will grow - the equivalent of a building’s structural design. A house built without one stands until you add a floor. Software without architecture works until the first real growth arrives: more users, a new process, a connection to your warehouse or accounting system.

That’s where the bill appears. An AI model asked for another feature drops it wherever it happens to fit - not where the plan says it should go, because there is no plan. After a dozen such steps, every change touches everything at once, and a fix in one place breaks three others. Architecture designed up front is cheap - a few decisions and a diagram. Architecture “discovered” after the fact, once the product is live with customers, means a rebuild that often costs more than starting over. Vibe coding doesn’t skip architecture out of malice. It simply can’t see it.

What is spec-driven development?

It’s an approach where the specification comes first and the code is its result - not the other way round. The order is exactly the one a good engineer always used:

Needs - what business problem the software is to solve.
Features - what it must do, and how you’ll know it does it well.
Documentation - the assumptions, limits and rules, written down.
Architecture - how the parts connect and how they’ll grow with the company.
Only now, the code - written with AI, to that plan.

What’s new is the role of the specification. As GitHub, maker of one of the tools for this work, puts it, the spec is “a contract for how your code should behave and becomes the source of truth your tools and AI agents use to generate, test, and validate code”. AWS’s Kiro works the same way: a project starts from three documents - requirements, technical design and a task list - before a single line of code exists. AI doesn’t replace thinking about what you’re building. It speeds up writing it down and carrying it out.

Why does starting from a specification come out cheaper?

Because the most expensive bug is the one caught late. Classic software-cost research (Barry Boehm, reproduced in an analysis by NASA engineers) shows that a bug found after release can cost up to 100 times more than the same bug caught at the requirements stage. Newer analyses temper the scale - across 171 projects, this effect was not confirmed as a universal rule - but the direction has repeated for decades: the later a problem surfaces, the more its fix costs. A specification moves problem-finding to the start, where a change is one sentence in a document, not a rebuild of a finished product.

The same shows in project-failure statistics. Standish Group reports have for years - despite debate over their method - named incomplete and changing requirements as a leading cause of failed projects. Projects most often founder not on the code, but on no one settling precisely enough, at the start, what was to be built.

AI doesn’t repeal this rule - it sharpens it. In Google’s DORA 2024 report, a 25% increase in AI adoption was associated with an estimated 7.2% drop in delivery stability, because faster-generated code then needs more fixing. When AI writes in a flash, the bottleneck stops being the writing. It becomes stating clearly what should be built. The specification.

If AI is meant to speed things up, why add a specification?

Because the specification isn’t a brake - it’s where steering is cheapest. AI helps write it: turning a conversation about needs into written requirements and a feature list. Then it speeds up the slowest stage, the build itself, because it builds to a ready plan instead of guessing.

Let’s be honest about the numbers: there’s no hard, independent study yet saying “spec-driven is X% faster”. The approach is young, and the field itself admits it’s still working out the standards. What’s defensible is simpler and matters more to your budget: fewer fixes, less debugging, and software you can grow. A specification costs a few days up front. Its absence costs months at the end.

What happens after launch - maintenance, debugging, fixes?

This is where most of the bill is. Research on the software lifecycle shows that maintenance takes up the majority of total cost - estimates reach 60-90% - meaning what happens after launch costs more than building it. Software built “on vibes” enters this phase in debt: no one understands the code, so every fix is a risk.

Spec-driven gives a lasting edge here. The specification stays a living reference point - for your team and for AI. On that foundation, monitoring runs continuously: AI watches production, catches errors, points to their source and proposes a fix a human approves. This isn’t a promise from the future. Sentry reports that its debugging assistant identified the root cause with 94.5% accuracy across more than 38,000 issues (that’s the vendor’s own data, not an independent audit - but it shows the direction). That’s what control looks like after launch: tests before release catch known scenarios, continuous monitoring catches the ones tests didn’t foresee. One without the other is half a control.

What does this mean when you commission or build software?

That you have the right to demand a plan before anyone writes code. That’s what separates a product from a prototype sold at a product’s price. Before the build decision is made, four things should be on the table:

The need, written as a business problem, not a feature list.
The features, with a clear test of “how we’ll know it works”.
The architecture, described enough to know how it will grow.
A maintenance plan - who will monitor it after launch, and how.

For your company, this model shifts the maths in favour of software tailored to your processes. When the specification is clear and AI shortens the build, a tailored app stops being a luxury. You no longer pay per-seat licences or for 80% of a ready-made tool’s features you never use - the barrier to custom software is lower than ever. And because the starting point is a written specification, the know-how and documentation stay with you. Your team understands what it has and will maintain it once the project ends - because the know-how and documentation stay with you, not in a vendor’s black box.

Wondering whether a given process at your company is better served by a ready-made tool or a tailored app? We’ll show it on your own example during a free consultation.

Common questions

Is it better to commission custom software or buy ready-made? Ready-made wins when the process is standard and not your edge - accounting, email, a basic CRM. Custom pays off where the process is your competitive advantage and no box fits without compromise. Today, with AI shortening the build, that second path is cheaper than it was even a year ago. How to cost it, we show in how much adopting AI costs.

Will my team maintain this software after the project ends? Yes - because the specification, documentation and architecture stay with you, and the team learns them as you go. That’s the whole difference between a partner and dependence on a single vendor.

Where to start? With one process that has a measurable effect, not a whole system at once. We describe the same principle for adopting AI in where to start.

Key takeaways

Vibe coding is a lack of control over the code, not merely using AI. Great for a prototype, risky in production.
The last 30% is the hardest - security, bugs and maintenance cost more than the first demo.
Architecture is the plan for the whole, and vibe coding can’t see it - bolted on later, it costs like rebuilding a house.
Spec-driven is a proven engineering process - needs, features, documentation, architecture, then code with AI.
The most expensive bug is the one caught late - a specification moves decisions to where change costs least.
Most of the cost is after launch - maintenance is the majority of the bill, and here specification plus continuous AI monitoring give the biggest edge.
Before you commission software, demand a specification. Its absence isn’t a saving - it’s a deferred cost.