Skip to content

Spec-Driven Development in the AI Era: Stop Stalling with Output Contracts

TL;DR: Fix the output contract (a consistent output shape) first, then define acceptance criteria for pass/fail—AI collaboration becomes far less likely to stall.

The more you delegate implementation to AI, the easier it is to stall without knowing why. The root cause is rarely skill; it is the absence of intermediate outcomes that can be judged. This article generalizes the lessons into three principles: output contracts, visibility, and acceptance.

Target Audience

  • People who want to build with AI but often lose direction mid-project

Key Points

  1. A clear model of why AI projects stall
  2. A restartable workflow centered on output contracts
  3. A clean separation of spec, design, and acceptance

Why projects stall: it is structural

No intermediate outcomes

You only have a goal and a starting point. Without evaluable milestones, nobody knows what “done” means next.

Foundation work comes before visibility

When the base is built first, visible progress is delayed. A long invisible phase kills momentum.

No pass/fail oracle

If there is no fixed input and expected output, correctness cannot be judged. Reviews stop, and improvements stop with them.

Stalling is a design problem, not a capability problem

Without checkable outcomes, both AI and humans get stuck.

Three principles that make restarts possible

Principle 1: Fix the output contract

Lock the shape of the output before you lock the logic. Field meanings and slots should remain stable while the internals evolve.

The minimum contract is simply “the same slots every time.” A short template is enough.

{
  "as_of": "YYYY-MM-DD",
  "decision": "Proceed|Caution|Pause",
  "summary": "...",
  "reasons": ["...", "...", "..."],
  "proposal": "...",
  "confidence": 0.0,
  "data_health": {"freshness_hours": 0, "missing": []},
  "version": "brief.v1"
}

Principle 2: Review outputs, not code

People can judge outputs faster than code. Keep change units small enough to compare before/after outputs at a glance.

The smallest acceptance set is fixed inputs, expected outputs, and a diff rule. This level is enough.

  • Input: fixtures/input_case_01.json
  • Expected: expected/output_case_01.json
  • Rule: when a diff appears, record the reason in the output reasons
Concrete acceptance example

- "decision": "Proceed"
+ "decision": "Caution"
Review comment example: "confidence dropped below 0.6, so changed to Caution. Rationale added to reasons field."

Principle 3: Track work by “empty slots”

Manage tasks as missing output slots, not features. This keeps next actions obvious and reduces thrashing.

A short-cycle roadmap

  • Handcraft a few days of outputs to align on the “right shape”
  • Fill the minimum visible fields first
  • Expose data health early and define missing-data behavior
  • Add one indicator at a time and stabilize explanation templates
  • Connect judgment and proposals gradually to avoid sharp swings

Rolling back after changing the output contract

Increment the version field (e.g., brief.v1brief.v2) and archive old expected outputs in expected/v1/. If diffs grow too large, roll back to the previous version and reassess.

AI request example

Requests can be short, but stating deliverables, non-goals, and forbidden changes reduces drift.

【Purpose】Auto-judge the decision field in the daily report
【Deliverable】Output matching fixtures/input_01.json → expected/output_01.json
【Non-goal】UI implementation, data fetching logic
【Forbidden changes】JSON field names, version field format
【Acceptance】Pass if diff is empty; if diff exists, append reason to reasons field

Failure patterns and how to avoid them

  • Oversized specs: split until change units are small
  • Invisible progress: prioritize visible outputs
  • No acceptance checks: judge by output diffs
  • Uncertain domains: switch to a hypothesis-and-validation loop

Summary

Spec-driven development in the AI era is not about thicker specs; it is about stronger output contracts and acceptance. When outcomes are visible and judgeable, progress resumes. Start by defining the single output you want to see every day.