Spec-Driven Development in the AI Era: Stop Stalling with Output Contracts¶
TL;DR: Fix the output contract (a consistent output shape) first, then define acceptance criteria for pass/fail—AI collaboration becomes far less likely to stall.
The more you delegate implementation to AI, the easier it is to stall without knowing why. The root cause is rarely skill; it is the absence of intermediate outcomes that can be judged. This article generalizes the lessons into three principles: output contracts, visibility, and acceptance.
Target Audience
- People who want to build with AI but often lose direction mid-project
Key Points¶
- A clear model of why AI projects stall
- A restartable workflow centered on output contracts
- A clean separation of spec, design, and acceptance
Why projects stall: it is structural¶
No intermediate outcomes¶
You only have a goal and a starting point. Without evaluable milestones, nobody knows what “done” means next.
Foundation work comes before visibility¶
When the base is built first, visible progress is delayed. A long invisible phase kills momentum.
No pass/fail oracle¶
If there is no fixed input and expected output, correctness cannot be judged. Reviews stop, and improvements stop with them.
Stalling is a design problem, not a capability problem
Without checkable outcomes, both AI and humans get stuck.
Three principles that make restarts possible¶
Principle 1: Fix the output contract¶
Lock the shape of the output before you lock the logic. Field meanings and slots should remain stable while the internals evolve.
The minimum contract is simply “the same slots every time.” A short template is enough.
{
"as_of": "YYYY-MM-DD",
"decision": "Proceed|Caution|Pause",
"summary": "...",
"reasons": ["...", "...", "..."],
"proposal": "...",
"confidence": 0.0,
"data_health": {"freshness_hours": 0, "missing": []},
"version": "brief.v1"
}
Principle 2: Review outputs, not code¶
People can judge outputs faster than code. Keep change units small enough to compare before/after outputs at a glance.
The smallest acceptance set is fixed inputs, expected outputs, and a diff rule. This level is enough.
- Input:
fixtures/input_case_01.json - Expected:
expected/output_case_01.json - Rule: when a diff appears, record the reason in the output reasons
Concrete acceptance example
- "decision": "Proceed"
+ "decision": "Caution"
Principle 3: Track work by “empty slots”¶
Manage tasks as missing output slots, not features. This keeps next actions obvious and reduces thrashing.
A short-cycle roadmap¶
- Handcraft a few days of outputs to align on the “right shape”
- Fill the minimum visible fields first
- Expose data health early and define missing-data behavior
- Add one indicator at a time and stabilize explanation templates
- Connect judgment and proposals gradually to avoid sharp swings
Rolling back after changing the output contract
Increment the version field (e.g., brief.v1 → brief.v2) and archive old expected outputs in expected/v1/. If diffs grow too large, roll back to the previous version and reassess.
AI request example¶
Requests can be short, but stating deliverables, non-goals, and forbidden changes reduces drift.
【Purpose】Auto-judge the decision field in the daily report
【Deliverable】Output matching fixtures/input_01.json → expected/output_01.json
【Non-goal】UI implementation, data fetching logic
【Forbidden changes】JSON field names, version field format
【Acceptance】Pass if diff is empty; if diff exists, append reason to reasons field
Failure patterns and how to avoid them¶
- Oversized specs: split until change units are small
- Invisible progress: prioritize visible outputs
- No acceptance checks: judge by output diffs
- Uncertain domains: switch to a hypothesis-and-validation loop
Summary¶
Spec-driven development in the AI era is not about thicker specs; it is about stronger output contracts and acceptance. When outcomes are visible and judgeable, progress resumes. Start by defining the single output you want to see every day.