Why Spec-Driven Development Is the Backbone of Agentic Engineering

The way we define work in software engineering has not kept pace with how we execute it. We are asking AI agents to implement features from user stories that were designed for human interpretation. Then we wonder why the output is inconsistent, incomplete, or wrong.

User stories were a useful innovation for their time. “As a user, I want X so that Y” gave development teams a template for capturing intent. But intent is not enough for agent-driven execution. Agents do not infer context from body language in a standup meeting. They do not absorb domain conventions from years of working at the same company. They do not fill in the gaps the way an experienced developer does by pattern-matching against past projects.

Agents need specifications. Formal, precise, structured specifications that define what to build, how it should behave, what constraints it must respect, and how success is measured. Spec-driven development is not a process preference. It is the backbone of agentic engineering, the structural foundation without which agent-driven delivery is unreliable.

The Problem With User Stories

User stories were designed for a specific context: agile teams where a product manager sits with developers, discusses requirements, answers questions, and iterates on understanding through conversation. The story itself was never intended to be a complete specification. It was a conversation starter.

This works when the implementer is a human who can ask questions, draw on experience, and exercise judgment about unstated requirements. It does not work when the implementer is an agent that treats the story as a complete specification, because it has no mechanism for knowing what is missing.

Consider a typical user story:

As a customer, I want to search for products by name so that I can find what I am looking for quickly.

A human developer reads this and brings a mountain of implicit context: they know the existing search infrastructure, the database schema, the frontend component library, the performance requirements, the accessibility standards, the testing conventions, and the deployment pipeline. They also know to ask about edge cases: what happens with misspellings? Partial matches? Empty results? Special characters?

An agent reads the same story and produces something. Maybe it is a basic search function. Maybe it implements full-text search. Maybe it hits the database directly without a search index. Maybe it ignores error handling. Maybe it creates a new API endpoint that does not follow the project’s routing conventions. The output is unpredictable because the input is ambiguous.

The solution is not to write better prompts around user stories. The solution is to replace user stories with specifications.

What a Spec-Driven Specification Looks Like

A specification for agentic development is a formal document that encodes enough information for an agent to produce correct, complete, and convention-compliant output without additional human guidance.

Here is the anatomy of an effective spec:

Outcome Reference

Every specification starts with the business outcome it serves. This is not decorative. It provides the agent (and the human reviewer) with the evaluative context for judging the implementation. If the outcome is “reduce product search abandonment from 23% to 12%,” that frames every design decision differently than if the outcome is “add a search feature.”

outcome:
  metric: product_search_abandonment_rate
  current: 0.23
  target: 0.12
  timeline: 90 days post-deployment

Functional Requirements

Functional requirements define what the system must do. They are stated in terms of inputs, processing rules, and expected outputs. Precision matters: every requirement should be testable.

Poor: “The system should search for products.”

Better:

functional_requirements:
  - id: FR-001
    description: "Accept a search query string and return matching products"
    input:
      query: string, 1-200 characters, UTF-8
      page: integer, >= 1, default 1
      page_size: integer, 10-100, default 20
    processing:
      - Tokenize query into terms
      - Match against product name, description, and SKU fields
      - Apply fuzzy matching with Levenshtein distance <= 2
      - Rank results by relevance score (TF-IDF)
      - Apply active/in-stock filter (only return purchasable products)
    output:
      results: array of product objects (id, name, price, image_url, relevance_score)
      total_count: integer
      page: integer
      page_size: integer

This level of precision eliminates interpretation. The agent knows exactly what the input format is, what processing steps to implement, what matching algorithm to use, and what the output structure looks like.

Non-Functional Requirements

Non-functional requirements define the quality attributes the implementation must satisfy. These are the constraints that distinguish production-ready code from prototype code.

non_functional_requirements:
  performance:
    - p95 response time <= 200ms for queries against 1M product catalog
    - System must handle 500 concurrent search requests
  security:
    - Query input must be sanitized against injection attacks
    - Search results must respect product visibility permissions
  accessibility:
    - Search results must include ARIA labels for screen readers
    - Keyboard navigation must support result selection
  scalability:
    - Search index must support incremental updates without downtime
    - Architecture must support horizontal scaling of search service

Interface Contracts

Interface contracts define exactly how the new component interacts with the existing system. This is where many agent-generated implementations fail: they create technically correct code that does not fit into the existing system architecture.

interface_contracts:
  api:
    method: GET
    path: /api/v2/products/search
    authentication: Bearer token (existing auth middleware)
    rate_limit: 100 requests/minute per user
    response_format: JSON, following existing API envelope pattern
      {
        "data": [...],
        "meta": { "total": N, "page": N, "page_size": N },
        "errors": []
      }
  database:
    search_index: Elasticsearch (existing cluster, index: products_v3)
    read_replica: Use read replica for search queries (connection: db_read)
  events:
    publish: "product.searched" event to existing event bus on each query
    payload: { query, result_count, user_id, timestamp }

By specifying the exact API pattern, the database connection, the event bus integration, and the authentication approach, the agent’s output will integrate cleanly with the existing system. Without these specifications, the agent would have to guess, and guesses at the integration layer are where most agent-generated code fails.

Validation Criteria

Validation criteria define what “done” means in testable terms. Each criterion maps to an automated test that the implementation must pass.

validation_criteria:
  - id: VC-001
    description: "Returns correct results for exact product name match"
    test: Search for "Blue Widget" returns product ID 12345 as first result
  - id: VC-002
    description: "Handles fuzzy matching for misspellings"
    test: Search for "Blu Widgit" returns product ID 12345 within top 3 results
  - id: VC-003
    description: "Returns empty results gracefully"
    test: Search for "xyznonexistent" returns empty array with 200 status
  - id: VC-004
    description: "Respects pagination"
    test: Search with page=2, page_size=10 returns items 11-20 of result set
  - id: VC-005
    description: "Meets performance requirement"
    test: p95 latency under 200ms with 1M products and 500 concurrent requests
  - id: VC-006
    description: "Sanitizes injection attempts"
    test: Query containing SQL/NoSQL injection patterns returns safe empty results

These validation criteria serve dual purpose: they tell the agent what success looks like, and they define the automated test suite that the agent’s output must pass to clear the quality gate.

Domain Context

The final section provides the broader context that the agent needs to produce output that fits the domain and the project.

domain_context:
  architecture: "Microservices, Node.js services, Elasticsearch for search"
  conventions:
    - "Use existing error handling middleware (see /lib/errors.js)"
    - "Follow existing service structure (see /services/catalog for reference)"
    - "Use existing logging library (winston, structured JSON)"
  related_specifications:
    - "Product catalog schema: /specs/product-catalog.md"
    - "Authentication system: /specs/auth-service.md"
    - "API conventions: /specs/api-standards.md"
  historical_decisions:
    - "Elasticsearch chosen over Solr in 2024 for cost and operational simplicity"
    - "Product visibility permissions are enforced at the API layer, not the search index"

The Spec Development Process

Writing specifications of this quality takes time and skill. It is not busywork. It is the primary intellectual activity in AI-native engineering. Here is the process we use at CONFLICT within our HiVE methodology:

Step 1: Outcome Alignment

Before writing any specification, clarify the business outcome. What metric are we trying to move? By how much? By when? This grounds every subsequent decision and prevents the common failure of building something technically impressive that does not serve the business.

Step 2: Domain Research

Gather the domain context. Interview stakeholders. Review existing documentation. Examine the existing codebase. Map the integration points. Understand the business rules. This research phase typically takes 20-30% of the total specification time, and it is the most valuable investment because it surfaces the non-obvious requirements that would otherwise become production bugs.

Step 3: Draft Specification

Write the first draft of the specification. Start with functional requirements, then add non-functional requirements, interface contracts, validation criteria, and domain context. Use the structured format described above. Do not worry about perfection in the first draft. Focus on completeness.

Review the specification with two audiences: domain experts (who validate that the business logic is correct) and engineers (who validate that the technical requirements are feasible and complete). Incorporate feedback. Resolve ambiguities. Add missing edge cases.

Step 5: Validation Criteria Development

Develop the detailed validation criteria and, ideally, the test cases themselves. Some teams write the test cases as part of the specification. Others generate them from the validation criteria using a separate agent. Either way, the validation criteria must be complete before implementation begins.

Step 6: Agent Execution

With the specification complete, hand it to the agent for implementation. The agent produces code that satisfies the functional requirements, respects the non-functional requirements, conforms to the interface contracts, and passes the validation criteria. The specification is the contract. The agent’s job is to fulfill it.

Step 7: Verification

Human engineers review the agent’s output against the specification. Automated tests verify the validation criteria. Integration tests verify the interface contracts. Performance tests verify the non-functional requirements. Any failures are diagnosed, the specification is refined if necessary, and the agent regenerates.

Why This Approach Scales

Spec-driven development scales in ways that story-driven development does not, precisely because specifications are formal enough to be reused, composed, and versioned.

Reusability. Well-written specifications become organizational assets. The search specification above can be adapted for other searchable entities with minimal modification. Over time, a library of specifications accumulates that accelerates every subsequent project.

Composability. Complex systems are built by composing specifications. Each specification defines a component with clear interfaces. The system-level architecture defines how components connect. This compositional approach lets agents build components in parallel while maintaining integration integrity.

Versioning. Specifications can be versioned and diffed just like code. When requirements change, the specification is updated, the diff is reviewed, and the agent regenerates only the affected components. This makes change management tractable even in large systems.

Onboarding. New team members understand the system by reading specifications, not by reverse-engineering code. Specifications are documentation by default. They describe not just what the system does but why it does it (through outcome references) and how success is measured (through validation criteria).

Quality measurement. Because each specification includes validation criteria, quality is measurable at every level. You can track what percentage of specifications are implemented correctly on the first pass, what common failure patterns exist, and where your specifications need improvement. This data drives continuous improvement of both the specifications and the agents.

The Investment Case

Spec-driven development requires more upfront investment than story-driven development. A specification takes hours to write where a user story takes minutes. This cost is real and it is worth addressing directly.

The return comes in three forms:

Reduced rework. Precise specifications produce correct implementations more often. In our experience, a well-specified feature has a 80-95% first-pass success rate with agent implementation, compared to 30-50% for story-level descriptions. The time saved on rework vastly exceeds the time spent on specification.

Faster agent execution. Agents work faster with precise specifications because they spend less time on interpretation and more time on implementation. A feature that takes an agent two hours to implement from a specification might take eight hours of back-and-forth prompt engineering from a story.

Compounding returns. Every specification improves the organizational library. Every agent execution generates data about specification effectiveness. The system gets faster and more reliable with every iteration.

The breakeven point is typically two to three projects. By the third project using spec-driven development, the specification library, the team’s specification-writing skill, and the accumulated domain models make the upfront cost negligible compared to the delivery speed and quality improvements.

Getting Started

If you are currently operating with user stories and want to transition to spec-driven development:

Pick one feature in your current backlog.
Write a specification for it using the format above.
Give it to an agent and compare the output to what you would have gotten from a user story.
Measure the difference in quality, completeness, and integration correctness.
Use that evidence to build the case for broader adoption.

The transition is not all-or-nothing. Start with specifications for the most complex or most critical features, where the value of precision is highest. Expand as your team builds the skill and the specification library grows.

Spec-driven development is not a process innovation for its own sake. It is the necessary adaptation to a world where agents are primary implementers and the quality of the instructions determines the quality of the output. User stories were designed for human conversation. Specifications are designed for agent execution. The delivery model has changed. The input format must change with it.

posted by admin

Jan 03, 2026 - 11