Spec-Driven Development: origin, methodology, tools, and what the hype doesn't tell you

Mar 13, 2026

Available - Kiro: free plan at kiro.dev

spec-driven-development
sdd
methodology
ai-agents
kiro
claude-code
vibe-coding
news-date:2026-03-13
created-at:2026-03-13
updated-at:2026-03-13

tl;dr — SDD is a legitimate response to a real problem with AI agents. The concept makes sense, the tools are immature, the cost is high, and ROI evidence is still almost entirely anecdotal. Worth understanding. Worth being skeptical about.

The problem that started it all

To understand why Spec-Driven Development emerged, you need to take a step back.

In 2025, AI agents like Claude Code, Cursor, and Codex stopped being glorified autocomplete and started executing complex tasks autonomously. With that came the phenomenon of vibe coding: you describe what you want in natural language, the agent generates hundreds of lines of code, and you approve without fully understanding what was done.

The result? Code that works until it needs to be maintained. Invisible architectural decisions. Technical debt accumulating at industrial speed.

The most rigorous study on the subject — conducted by METR in 2025 with experienced developers on real open source projects — concluded that developers using AI tools were, on average, 19% slower than without them. Not faster. The reason: unstructured prompts created debugging loops that consumed all the time saved in initial code generation.

Spec-Driven Development is the industry’s attempt to solve that problem at the root.

Where the idea comes from

The idea of writing specifications before writing code is not new. It is as old as software engineering itself.

1963: Margaret Hamilton, managing the software for NASA’s Apollo missions, coined the term “software engineering” because programs had grown beyond any one person’s ability to fully comprehend them. She realized: this is engineering, it needs process.

1968: NATO organized a conference in Berlin that formally identified the Software Crisis: computers now made it possible to write programs so complex that they could not be managed adequately. The volume of code had surpassed the human capacity to reason about it.

1972: Dijkstra, in his Turing Award lecture, summed it up: “As long as there were no machines, programming was no problem at all. When we had a few weak computers, programming became a mild problem. Now we have gigantic computers, and programming has become an equally gigantic problem.”

The response of the era was process: Waterfall as a DoD standard, then Agile in 2001, then CI/CD in the cloud making Agile viable at scale.

Now we are in the next cycle. Drew Breunig, the researcher who popularized the term SDD in 2026, described it precisely: “Our current software crisis is our inability to manage the complex codebases that new models enable. Before, the problem was that we couldn’t keep all the code in our heads. Now we can’t even read all of our code.”

AI agents enable waterfall-level volume at an agile cadence. That is the problem SDD attempts to address.

What is Spec-Driven Development

SDD is a development methodology that treats specification as the primary artifact — not code.

Instead of the traditional cycle of prompt → code → iteration, the flow becomes:

Spec → Plan → Tasks → Code

The spec defines intent, constraints, acceptance criteria, and architecture before any implementation. The AI agent then executes against that structured input rather than interpreting a vague description.

An important distinction: with tools like Spec Kit, Kiro, and Tessl, the spec itself is generated by the AI. You describe the goal in natural language, and the agent produces the specification files — typically requirements.md, design.md, and tasks.md — that it will then use as context during implementation. The spec is not a document written manually upfront by analysts; it emerges from the conversation between developer and agent, before any code is written.

This changes the diagnosis of classic problems with specifications: the distance between spec and code is no longer a matter of weeks or different teams. Spec and code are generated in the same cycle, by the same tool, minutes apart. The problem that persists is not temporal separation — it is the upfront work, the token cost, context degradation across iterations, and what happens after deploy when reality diverges from what was specified.

There are different levels of SDD adoption:

Spec-first: the spec is generated before implementation and used as a guide for that task. Once complete, it is discarded.

Spec-anchored: the spec is retained after the task to guide future evolution and maintenance of the system. It is a living artifact that travels alongside the code.

Spec-as-source: the most ambitious level. The spec is the source code. The generated code is merely a compiled artifact of the spec, marked with // GENERATED FROM SPEC - DO NOT EDIT. Tessl is attempting to make this viable.

The SDD Triangle — and why the cycle is harder than it looks

The most honest contribution to the debate came from Breunig in March 2026, after building the whenwords project (an open source library with zero code — only spec + 750 conformance tests) and observing how similar projects evolved.

His central insight: SDD is not a linear equation. It is a feedback cycle.

He proposes the SDD Triangle: spec, tests, and code are three nodes that must stay synchronized at all times. When the code advances, the spec must be updated. When the spec changes, new tests need to be written. When tests fail, the code needs to change — and sometimes the spec was wrong too.

The problem is that keeping these three nodes synchronized is hard:

Writing specs is hard. They are never exhaustive and are written before the software encounters the real world.
Writing tests is hard. Even before agents, nobody enjoyed writing tests.
Updating specs and tests after implementation feels like overhead, especially when you are using agents precisely to move fast.
LLMs make silent decisions during implementation. Those decisions rarely find their way back into the spec.

The practical result: the spec is written, the code is generated, the product is shipped. The spec becomes stale within days. Nobody goes back to update it. The original problem — lost intent, invisible decisions — has simply migrated to a different level.

The tool ecosystem

The SDD tool space exploded between late 2024 and early 2026. It helps to understand the layers:

Layer 1 — Spec frameworks: define and manage specification artifacts → Spec Kit, Tessl, Kiro, BMAD, OpenSpec, cc-sdd

Layer 2 — Planning and task systems: convert specs into executable task graphs → Taskmaster, Agent OS, Beads, Feature-Driven-Flow

Layer 3 — Execution agents: write and modify code → Claude Code, Cursor Agent, Codex, Devika, OpenDevin, CrewAI

Layer 4 — AI IDEs: integrate all layers into a single workflow → Kiro, Windsurf, Cursor, Claude Code, Copilot

Most developers today use only Layer 3 — which is exactly where the vibe coding problem lives.

Table of relevant tools

Tool	Type	Approach	Status
Spec Kit (GitHub)	CLI	Constitutional spec + 4 phases	Open source, GA
Kiro (AWS)	IDE	EARS notation, 3 documents	GA, free tier
Tessl	CLI + Registry	Spec-as-source (most ambitious)	Closed beta
BMAD	CLI	Multi-agent, role-based personas	Open source
OpenSpec	CLI	Proposal + approval workflow	Open source
Plumb	CLI	Spec/test/code sync via git hooks	PoC (`pip install plumb-dev`)
smart-ralph	CLI	Minimal SDD scaffold	Open source

How to use it in practice: Kiro as an example

Kiro is the most accessible entry point for developers already using VS Code. It is a fork of Code OSS (the open source core of VS Code) built by a small team inside AWS, deliberately positioned outside the AWS ecosystem — you do not need an AWS account to use it.

Installation

Go to kiro.dev and download the installer for your operating system. Sign in with GitHub or Google.

The three-step workflow

Step 1 — Requirements

You describe what you want to build in natural language. Kiro translates that into EARS notation (Easy Approach to Requirements Syntax):

WHEN user submits login form
  AND credentials are valid
THEN system must authenticate the user
  AND redirect to the dashboard
  AND log the login event with timestamp

This notation enforces explicit, machine-readable constraints. You review and adjust the generated requirements.md before moving forward.

Step 2 — Design

Kiro analyzes your existing codebase and generates a design.md with architectural decisions, stack choices, and component structure. For a React + Node.js project, you will see something like:

## Architecture

Frontend: React 18 with React Router v6
Backend: Express 4.x with JWT middleware
Database: PostgreSQL via Prisma ORM
Auth: bcrypt (salt rounds: 12) + JWT (access: 15min, refresh: 7d)
Testing: Jest + Supertest for integration

You review this document before any code is written. Disagreements with the actual project architecture surface here — not after 400 generated lines.

Step 3 — Tasks

Kiro generates a tasks.md with discrete implementation steps, sequenced by dependency:

- [ ] Task 1: Database setup and user schema
- [ ] Task 2: POST /auth/register endpoint with Joi validation
- [ ] Task 3: POST /auth/login endpoint with token generation
- [ ] Task 4: JWT authentication middleware
- [ ] Task 5: POST /auth/refresh endpoint
- [ ] Task 6: Integration tests for all endpoints

You control which tasks to execute and when. The agent implements one task at a time, with review checkpoints in between.

Steering files

In addition to the three spec documents, Kiro supports steering files — persistent configuration files that define standards for the entire codebase:

# .kiro/steering/code-style.md
- Use strict TypeScript, no implicit `any`
- Prefer `async/await` over callbacks
- Variable names in camelCase, constants in SCREAMING_SNAKE_CASE
- Public functions must have JSDoc

Hooks

Kiro supports event-based hooks — agents that trigger automatically in response to defined events:

{
  "hooks": [
    {
      "name": "security-audit",
      "trigger": "on-save",
      "agent": "Check the saved file for security vulnerabilities"
    },
    {
      "name": "test-generator",
      "trigger": "on-file-create",
      "pattern": "src/**/*.ts",
      "agent": "Generate unit tests for the new file"
    }
  ]
}

What works well

Reviewing a design document before implementation is fundamentally different from reviewing 500 generated lines of code afterward. Problems surface at the right level.
Steering files eliminate a huge volume of repetitive prompts about style and conventions.
The sequential task flow keeps the agent’s context focused — one problem at a time, not an entire system.

What does not work well

Kiro generated 16 acceptance criteria for a simple bug fix in independent tests. For small changes, the overhead is real.
Spec generation for moderately complex features takes 30–45 seconds. That burns in a fast development flow.
EARS notation has a learning curve. It is not intuitive for anyone who has never worked with formal specifications.
Kiro uses the Open VSX extension registry (not Microsoft’s), which means no official C# support — a serious limitation for .NET teams.
Autopilot mode (executing multiple tasks without supervision) produces less predictable results. The per-task approval flow is where the real value lies.

Pricing

During the public preview: free (with interaction limits). GA: Free (50 interactions/month) · Pro $19/month (1,000 interactions) · Pro+ $39/month (3,000 interactions)

The real problems

1. Upfront work that goes against developer instinct

The SDD with AI promise is that the spec is generated quickly — and it is. But “quickly” does not mean “free.” Before a single useful line of code, you go through multiple review cycles: reviewing the generated spec, correcting misinterpreted intent, adjusting the architectural plan, approving or rejecting tasks. Each cycle demands real human attention.

For a medium-complexity feature, that overhead can be less than the cost of debugging vibe-coded code afterward. For a targeted bug fix, the cost exceeds the benefit by a wide margin — Kiro generated 16 acceptance criteria for a simple fix in independent tests. The upfront work does not disappear because the AI generates the spec; it changes in nature. Instead of writing, you review and decide. It is less costly, but it is not zero.

2. Token cost multiplied across every phase

Each SDD phase (spec → plan → tasks → implementation) consumes tokens before a single line of production code is written. With reasoning models, agentic usage can be 100x greater than standard usage.

Heavy SDD sessions with Claude Code hit the context limit regularly. The automatic compaction process takes 3 to 12 minutes. The cost is not only financial — it is also time and flow interruption. No public benchmark compares the total cost (tokens + human review time) of SDD versus direct development.

3. Context degradation across iterations

This is the least discussed problem and perhaps the most serious. AI-generated specs are fed back to the AI in the implementation phase. Each compaction cycle or session restart loses nuance. The agent implementing the code does not have full access to the decisions the spec-writing agent recorded.

The result: the code begins to subtly diverge from the spec as context degrades. The loop between spec and code, which should be synchronous, becomes a game of telephone — each phase amplifies small noise from the phase before.

4. Specs become misleading after deploy

With modern SDD tools, the spec is not “distant from the code” — it is generated in the same cycle, by the same tool, minutes before implementation. The classic problem of specs written by analysts disconnected from technical reality does not apply here.

The problem that persists is different: what happens after deploy. Edge cases only appear in production. Real user behavior diverges from what was modeled. Performance problems emerge under load. The spec is not updated. If it is used to guide future maintenance and evolution (spec-anchored), a stale spec becomes actively misleading: the agent trusts it, generates code based on a reality that no longer exists, and the developer only notices when the system breaks.

Breunig’s Plumb tool attempts to address this: a CLI that hooks into git commit, reads agent traces, extracts decisions made during implementation, and asks for developer approval before updating the spec. It is a PoC, not production — but it points in the right direction.

5. Artifact proliferation without real curation

Each feature generates multiple markdown files. The implicit promise that “the agent keeps the spec updated” does not hold — someone needs to review every spec change with the same rigor applied to code review. In practice, that does not happen. Specs accumulate, drift from the code, and become noise rather than signal.

6. Specifications do not eliminate non-determinism

The same spec can produce different implementations across different runs. Greater precision reduces variation but increases the cost of writing. And poorly written specs — the most likely outcome when the methodology is new to the team — produce well-organized code that does the wrong thing.

7. Risk of waterfall with AI in the loop

The most structural criticism comes from Thoughtworks and independent analyses: SDD as currently practiced risks being waterfall with AI in the loop. You are still defining everything upfront and hoping reality cooperates. For exploratory development with genuinely unknown requirements, context-driven approaches adapt better.

When it makes sense to use

SDD in its current form makes sense for:

Enterprise teams developing on large, existing codebases where architectural drift is expensive
Regulated environments where audit trails and requirements traceability are mandatory (EU AI Act, financial sector, healthcare)
Stable domains with clear contracts: APIs, data schemas, compliance rules
Teams with TDD or BDD maturity who want to extend that discipline to the AI layer
Emulation and portability: the use case where SDD shines brightest is reimplementing an existing system in another language, using the original system’s tests as the specification

SDD is probably overkill for:

Solo projects and rapid prototypes
Exploratory development where requirements are genuinely unknown
Small fixes or isolated bugs
Teams without the discipline to keep specs updated after implementation

The honest diagnosis

SDD is a legitimate response to a real problem. Vibe coding generates code faster than teams can govern. Specifications are a tool for restoring that governance.

But the parallel with TDD is instructive. TDD is 25 years old, has extensive empirical evidence, and real-world adoption sits around 8% in the strict sense (writing tests before code, consistently). SDD is generating hype because it solves a visible problem in the agent era, but it inherits the same fundamental challenge: developers prefer to ship. Any methodology that adds upfront work will fight against that instinct.

The absence of easily findable public criticism is itself a signal. When the benefits are trivial to find and the tradeoffs are not, a methodology is still in its marketing phase, not its maturity phase. Spec-Driven Development has not yet paid that entry fee.

The tools worth watching: Spec Kit for open source flexibility and IDE independence, Kiro for structured IDE workflows, Tessl for the more ambitious spec-as-source vision (still unproven), and Plumb as a reference for the hardest and still unsolved problem — keeping specs alive after the initial implementation.