Don't Keep Specs—Connect Them¶
For / Key Points
Audience: Developers who regularly use AI agents like Claude Code or Codex, or tech leads and development process designers considering adoption.
Key Points:
- The real debate isn't "specs vs. no specs"—it should be framed as three patterns: disposable / persistent-but-disconnected / connected-persistent
- Connection has four levels (visibility → sync → generation → verification), with verification serving as the strongest guardrail
- Persistent-but-disconnected specs have the worst cost-effectiveness—don't just keep specs, connect them
The Core of the "No Specs Needed" Argument Is a Middleman-Cost Argument¶
In today's AI coding landscape, both Claude Code and Codex officially operate on the premise of reading code, editing files, and executing commands as they work12.
Given this premise, the case for eliminating specs becomes fairly clear. What AI can read directly is code, tests, configurations, diffs, and execution results. If that's the case, then specs—which are a "translation" into natural language by humans—incur writing and maintenance costs, tend to drift from the code, and the AI can read the source code directly anyway. Specs start to look like pure middleman cost.
In other words, the core of the "no specs needed" argument isn't "I hate documentation." It's the question: if AI can read primary sources directly, isn't a human-authored intermediate translation structurally inefficient? This challenge is largely on target.
The "Specs vs. No Specs" Binary Is Slicing Along the Wrong Axis¶
The worst thing you can do here is reduce the discussion to a binary of "specs vs. no specs." This framing almost guarantees that the conversation will talk past itself.
Why? Because what "having specs" means varies wildly from person to person. Some mean a design document sitting untouched in an internal wiki. Others mean code generation from schema definitions. Some refer to listing spec references in agent configuration files and enforcing updates via hooks. Still others mean a combination of test contracts and policy checks. The moment you lump all of these under "having specs," the discussion spins its wheels.
The real axis to slice along is different. Spec practices in the AI era are better viewed through two dimensions:
Dimension 1: Are you treating specs as disposable, or persisting them?Dimension 2: If persisting, are they connected to verification and constraints?
These two dimensions reveal at least three patterns:
| Pattern | What It Looks Like | Characteristics |
|---|---|---|
| Disposable Spec | Requirements are organized and plans are made on the spot, but not kept as lasting assets | Fast. Suited for exploration and prototyping |
| Persistent but Disconnected Spec | Specs are written and saved, but lack sufficient connection to tests, constraints, or audit trails | Tends to only add cost |
| Connected Persistent Spec | Specs are saved and connected to tests, contracts, policies, and audit trails | Suited for rigorous operations |
What this article criticizes is the middle pattern: "persistent but disconnected specs." It's the state where specs are written, yet they have no real effect on downstream verification.
Note that this article addresses specs intended to govern implementation through AI agents. Specs whose primary purpose is human consensus-building, onboarding, or audit trails can function perfectly well without agent connection. The following discussion focuses on the situation where you want agents to follow specs, but have no mechanism to make them do so.
Plan-Only Is Not "the Absence of Specs" but "Disposable Specs"¶
The term "plan-only" sounds like specs don't exist at all, but in practice, both Claude Code and Codex perform structuring activities like requirements organization and task decomposition during their work12. A more accurate way to describe plan-only is: it's not that specs don't exist—it's that spec work isn't treated as a persistent asset.
You organize things on the spot. You have the AI decompose tasks if needed. But you don't later make that work the centerpiece of traceability, auditing, or regression management.
This pattern is quite rational for personal development, prototyping, and small-scale internal tools. The goal isn't "long-term governance" but "build quickly and validate." Build small, break small, fix small. For exploratory purposes, this is consistent.
"Thin Spec-Driven" Here Doesn't Mean Lightweight Specs¶
This is the most commonly misunderstood point. "Thin spec-driven" sounds like "all lightweight specs are bad." But that's not what it means.
"Thin" here refers not to low density, but to weak connection.
Thin specs—specs are written and saved, but they aren't sufficiently connected to tests or constraints. Change impact can't be traced. As a result, they never become more than "documents to be read."
Lightweight specs—the scope is limited and the volume of specs is small. But within that scope, they are connected to acceptance criteria and tests. So they function perfectly well for mid-scale application development.
In other words, being lightweight and being thin are not the same thing. With this distinction in place, "lightweight specs work fine in many mid-scale scenarios" and "thin spec-driven approaches are weak" coexist without contradiction.
Connection Has Levels¶
Up to this point, I've discussed connection as a binary—present or absent. But in practice, connection exists on a gradient. It can be broadly divided into four levels:
| Level | What It Guarantees | Example |
|---|---|---|
| Visibility (referenced) | The agent knows the spec exists | Listing spec references in CLAUDE.md |
| Sync (kept updated) | Spec freshness is maintained | Hooks or CI enforce spec updates before commits |
| Generation (used for generation) | Structural alignment between spec and implementation | Code generation from OpenAPI / JSON Schema / IDL |
| Verification (serves as guardrail) | Spec correctness is mechanically confirmed | spec → test cases, contract tests, policy as code |
These work in combination. Visibility → sync → verification is the main axis along which guarantee strength increases. Generation is a reinforcement mechanism that functions where formal schema definitions exist—it can be omitted and verification connections still hold.
Two things are worth noting. Even visibility alone is far better than nothing. Simply writing "read this document" in the agent's configuration file reduces the probability of the spec being ignored. The cost is nearly zero, and for many teams, this is the first step.
On the other hand, sync guarantees that the spec "has been updated" but not that it "is correct." Even with update cycles running, the content may still be wrong.
What this article criticizes as "disconnected" is the state where there isn't even visibility, or where visibility exists but neither sync nor verification has been reached. I don't intend to broadly dismiss teams that are at the visibility + sync stage as "thin."
Why Verification Connection Is Structurally Strong—From a Context Engineering Perspective¶
To understand the differences between levels, it helps to focus on the pathway through which knowledge "takes effect" on agents.
Pathway 1: Takes effect via the context window.
The agent reads the spec document, understands it, and follows it. Visibility and sync operate through this pathway.
The characteristics of this pathway are clear: it consumes context. The more spec documents there are, the more text the agent must read, crowding out context available for the code itself and diffs. Moreover, the agent having "read" something and having "followed" it are different things—compliance is probabilistic, not deterministic.
In practical terms, once you reach a scale of dozens of modules, each with its own design specs, constraints, and change history, "make it read everything" becomes unrealistic. You can use agent search and summarization to narrow things down partially, but the problem remains that the selection of relevant specs itself becomes probabilistic.
Pathway 2: Takes effect via execution results—functioning as a guardrail.
Tests fail, schema validation rejects, hooks block the commit. The agent doesn't need to "read and understand" the spec in advance. It executes, gets stopped, reads the failure message, and fixes things.
This pathway is called a "guardrail" because rather than governing the agent's behavior through pre-loaded knowledge, it automatically stops the agent at the point of deviation. Even without relying on human code review, implementations that deviate from the spec cannot pass through.
Compared to the approach of having agents read spec documents as prerequisite knowledge every time, this pathway can defer context consumption until the point where it's needed. The agent doesn't need to hold the full spec text at all times—it only needs to look up the relevant spec or design intent when a test fails. Context consumption doesn't drop to zero, but the shift from "always holding the full text" to "referencing only what's needed upon failure" significantly improves scalability.
In summary, verification connection is structurally strong for three reasons.
First, it provides guardrails that mechanically confirm correctness. Second, it scales by deferring context consumption to the minimum necessary. These first two are properties of the execution-based mechanisms themselves—tests and schema validation—and the same effect can be achieved with TDD-style tests written without spec connection.
The third reason is unique to spec connection: traceability. "Why does this test exist?" "Which tests need revisiting when the spec changes?"—by linking specs to the verification system, change impact becomes trackable. If tests exist without a correspondence to specs, decisions about adding, removing, or modifying tests depend on human memory. Spec connection replaces that dependency with structure.
Why Persistent-but-Disconnected Specs Have the Worst Cost-Effectiveness¶
Based on the discussion so far, the reason can be organized into two points.
Reason 1: The cost of writing remains, but the spec doesn't do any work.
The disposable spec pattern leans into speed. The lasting assets it leaves behind are thin, but that's by design, so it's consistent. The connected persistent spec pattern is heavy, but that weight is converted into tests, contracts, policies, audit trails, and change impact analysis. Where rigor is needed, the weight yields a proportionate return.
The persistent-but-disconnected pattern, however, involves writing, saving, and even worrying about updates to specs—but without sufficient connection to tests or constraints. When changes occur, the impact scope can't be traced. Only the cost of writing and maintaining specs persists.
As noted earlier, this article addresses the context of wanting agents to follow specs. Persistent-but-disconnected specs do have human-facing returns—confirming design intent during reviews, onboarding, and so on. But even accounting for those, if you want agents to follow specs yet the mechanism for doing so doesn't function as a guardrail, the returns don't justify the cost of writing and maintaining them.
At this point, you might think: "Then why not just write tests and schemas? Aren't natural-language specs unnecessary after all?" But even in a world of connected specs, natural-language documents retain a unique role. Test code describes "what must be satisfied" but not "why that decision was made." Design intent, business context, the starting point for change impact—these can only be expressed in natural language, and there are situations where the agent references this information when moving from test failure to a fix.
In other words, this article's argument is not "abolish natural-language specs" but rather: "Connect natural-language specs to the verification system—if a natural-language spec isn't connected, you're better off not writing it at all."
There's also the reverse question. At the outset, I framed the core of the no-specs argument as "middleman translation cost." If that's the case, isn't transforming natural-language specs into tests, contracts, and schema definitions just another form of middleman translation cost?
But these two have a structural difference. A natural-language spec that's only read begins diverging from the code the moment it's written, and if it isn't updated, it silently rots. Tests, schemas, and contracts, on the other hand, are executed. Because they're executed, divergence between spec and code is detected as a failure the moment it occurs. In other words, transformation into an executable format isn't "middleman translation" but "transformation into the final form"—and the fact that the target autonomously maintains its freshness makes it fundamentally different from a document that's merely read.
Reason 2: It keeps consuming the context window but doesn't scale.
An approach that persists specs and has the agent read them depends on knowledge that takes effect via the context window. As specs grow, context gets crowded, and there's a risk of degraded agent work quality. And since compliance is probabilistic, adding more specs doesn't proportionally strengthen guarantees.
If you want speed, it's more honest to go disposable. If you want rigor, you need to go all the way to verification connection—that is, making specs into guardrails—for the cost to be justified. That's why persisting specs that have no effect looks like the worst cost-effectiveness of all.
The Choice Is Not About "Philosophy" but About "Layer"¶
At this point, the discussion becomes fairly simple.
Personal development / prototyping → The disposable spec pattern is fine. Speed is king.
Mid-scale application development → Visibility + sync is sufficient in many scenarios. List spec references in agent configuration files and use hooks to prompt updates. Where generation is applicable (API definitions, schemas, etc.), use it aggressively. Here, "lightweight but connected" specs are effective—distinct from "thin but disconnected."
Core systems / regulated domains → Sync alone isn't enough. Specs need to function as guardrails—without verification connection, you can guarantee neither spec correctness nor context scalability. However, building verification requires upfront cost: linking specs to test cases, designing contract tests, establishing policy as code—these don't come together overnight. That's precisely why you need the judgment to identify where it's needed and strengthen connections incrementally.
Therefore, the real debate isn't "spec-driven or not." It's: if you're going to treat spec work as a persistent asset, what level of connection will you maintain?—that is the true question.
Conclusion¶
If AI can read code directly, specs look like middleman cost. That instinct isn't wrong.
However, that doesn't mean "don't write specs." What truly needs to be distinguished is whether specs are disposable or persistent, and if persistent, to what level they're connected to verification and constraints.
Connection has levels. Even just being referenced by an agent has an effect. If updates are enforced, freshness is maintained. But to mechanically confirm correctness as a guardrail and ensure context scalability, verification connection becomes necessary.
And the worst cost-effectiveness of all is the state where specs are written and kept, yet haven't reached any level of connection. Specs don't become assets just by being kept. At least for specs you want to be effective for AI agents, they only do their job once they're connected.
This logic of "connection" isn't limited to specs. Architecture decisions, coding standards, review criteria—the same structure applies to any knowledge you want agents to consistently follow. Writing a document and mechanically enforcing it are separate endeavors. Designing the latter is what will become the center of development process design in the agent era.
Related Articles¶
- Spec-Driven Development in the AI Era: Stop the Drift with Output Contracts
- Specs vs. Code vs. Prompts: Documentation Strategy for the AI Era
- A Practical Guide to Spec Management with Scrum and AI Coding
- Spec-Driven Development / Testing (SDD + TDD) Hub
Claude Code is officially described as "an agentic coding tool that reads your codebase, edits files, and runs commands." https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview ↩↩
Codex is officially positioned as "the command center for agentic coding." https://openai.com/index/introducing-codex/ ↩↩