Practical Guide to Enforcing Spec-Based Development with AI Agents: Claude Code & GitHub Copilot Complete Setup [2025 Edition]¶
Key Points¶
- Technically enforce the spec → design → implementation flow to prevent vibe coding chaos
- Set gate conditions in Claude Code and GitHub Copilot to block implementation without approval
- Complete development process control through instruction files, permissions, and hooks
- Quality assurance through agent role separation and permission restrictions
Important: Why "Enforcement" is Necessary
AI agents tend to treat instructions as "recommendations." Without technical enforcement mechanisms, you'll inevitably revert to vibe coding. This article provides concrete techniques to block implementation.
🎯 Overall Design: Spec-Based Development Enforcement Architecture¶
Phase Gate Approach¶
graph LR
A[📋 SPEC] -->|Approval Flag| B[🏗️ DESIGN]
B -->|Approval Flag| C[📝 TASKS]
C -->|Approval Flag| D[💻 IMPLEMENTATION]
D -->|Auto Validation| E[✅ TEST/PR]
A -.->|❌ Blocked| D
B -.->|❌ Blocked| D
style A fill:#ffe0e0
style B fill:#fff0e0
style C fill:#e0f0ff
style D fill:#e0ffe0
style E fill:#f0e0ffEnforcement Mechanism Comparison¶
| Tool | Enforcement Level | Mechanism | Effect |
|---|---|---|---|
| Claude Code | ⭐⭐⭐⭐⭐ | Hooks + Permissions | Complete blocking possible |
| GitHub Copilot | ⭐⭐⭐⭐ | Instructions + Firewall | Strong guidance + restrictions |
| Traditional Instructions | ⭐⭐ | README/Comments | Easily ignored |
🔒 Claude Code: Complete Control Implementation¶
1. CLAUDE.md - Memory and Behavioral Rules¶
# Project Operating Rules (Spec-first ENFORCED)
## ⚠️ CRITICAL: Phase Gates
### Phase 1: SPECIFICATION
- Location: `docs/spec/requirements.md`
- Approval Flag: `docs/spec/.approved`
- **BLOCKED ACTIONS**: Edit, Write, Bash (except spec-related)
### Phase 2: DESIGN
- **PREREQUISITE**: `docs/spec/.approved` MUST exist
- Location: `docs/design/architecture.md`
- Approval Flag: `docs/design/.approved`
- **BLOCKED ACTIONS**: Code editing, test execution
### Phase 3: TASKING
- **PREREQUISITE**: `docs/design/.approved` MUST exist
- Location: `docs/tasks/todo.yaml`
- Format: YAML with status field (todo/in-progress/done)
### Phase 4: IMPLEMENTATION
- **PREREQUISITE**: All above flags MUST exist
- **NOW ALLOWED**: Edit, Write source files
- **STILL BLOCKED**: Direct deployment, production access
## 🛡️ Enforcement Rules
```yaml
enforcement:
spec_not_approved:
message: "❌ SPEC not approved. Cannot proceed to design."
allowed_tools: [Read, Grep]
blocked_tools: [Edit, Write, Bash, TodoWrite]
design_not_approved:
message: "❌ DESIGN not approved. Cannot implement."
allowed_tools: [Read, Grep, TodoWrite]
blocked_tools: [Edit, Write, Bash]
📋 Definition of Done¶
- ✅ Spec approved by stakeholder
- ✅ Design reviewed and approved
- ✅ All tasks completed in todo.yaml
- ✅ Unit tests written and passing
- ✅ Integration tests passing
- ✅ Security scan clean
- ✅ Documentation updated
- ✅ PR links to spec & design docs
### 2. .claude/settings.json - Permission Control ```json { "permissions": { "deny": [ "Read(./.env*)", "Read(./secrets/**)", "Read(./credentials/**)", "Bash(rm -rf *)", "Bash(curl *)", "Bash(wget *)" ], "ask": [ "Edit(**/*.py)", "Edit(**/*.js)", "Edit(**/*.ts)", "Write(**/*.py)", "Write(**/*.js)", "Write(**/*.ts)", "Bash(npm install *)", "Bash(pip install *)", "Bash(go get *)" ], "allow": [ "Read(docs/**)", "Read(README.md)", "Grep(**)", "LS(**)" ] }, "tools": { "disabled_by_default": ["NotebookEdit", "WebFetch"], "require_confirmation": ["MultiEdit", "Write"] } }
3. Hooks - Gatekeeper Implementation¶
.claude/hooks/config.json¶
{
"hooks": {
"PreToolUse": [
{
"name": "Spec Gate Keeper",
"matcher": "Edit|Write|MultiEdit",
"hooks": [
{
"type": "command",
"command": "python3 .claude/hooks/gatekeeper.py",
"fail_on_error": true
}
]
}
],
"PostToolUse": [
{
"name": "Auto Format & Test",
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": "python3 .claude/hooks/auto_format.py"
}
]
}
]
}
}
.claude/hooks/gatekeeper.py¶
#!/usr/bin/env python3
"""
Phase Gate Keeper - Enforces spec-first development
"""
import os
import sys
import json
def check_phase_gates():
"""Check if current phase allows the requested action"""
# Phase 1: Check SPEC approval
if not os.path.exists('docs/spec/.approved'):
print("❌ BLOCKED: Specification not approved")
print("Required: Review and approve docs/spec/requirements.md first")
print("Run: touch docs/spec/.approved (after human review)")
return False
# Phase 2: Check DESIGN approval
if not os.path.exists('docs/design/.approved'):
# Allow spec and design editing only
if any(arg in sys.argv for arg in ['docs/spec', 'docs/design']):
return True
print("❌ BLOCKED: Design not approved")
print("Required: Review and approve docs/design/architecture.md")
print("Run: touch docs/design/.approved (after human review)")
return False
# Phase 3: Check task planning
if not os.path.exists('docs/tasks/todo.yaml'):
print("⚠️ WARNING: No task breakdown found")
print("Recommended: Create docs/tasks/todo.yaml first")
# Warning only, not blocking
# Phase 4: Implementation allowed
print("✅ All gates passed - implementation allowed")
return True
def main():
# Get the tool being used from environment or args
tool = os.environ.get('CLAUDE_TOOL', '')
# Read-only operations always allowed
if tool in ['Read', 'Grep', 'LS', 'BashOutput']:
sys.exit(0)
# Check phase gates for write operations
if not check_phase_gates():
sys.exit(1) # Block the operation
sys.exit(0) # Allow the operation
if __name__ == '__main__':
main()
4. Subagents - Role Separation¶
.claude/agents/specifier.md¶
---
name: specifier
description: Requirements analyst who creates and refines specifications
tools: [Read, Write, Grep, TodoWrite]
restrictions:
- Can only write to docs/spec/*
- Cannot edit source code
- Must produce requirements.md with clear acceptance criteria
---
# Specifier Agent Instructions
You are responsible for the SPECIFICATION phase only.
## Your Tasks:
1. Interview stakeholders (via user interaction)
2. Document requirements in `docs/spec/requirements.md`
3. Define clear acceptance criteria
4. Create user stories with testable outcomes
## Output Format:
```markdown
# Requirements Specification
## User Stories
- As a [user type]
- I want [functionality]
- So that [business value]
## Acceptance Criteria
- [ ] Criterion 1 (testable)
- [ ] Criterion 2 (measurable)
- [ ] Criterion 3 (observable)
## Non-Functional Requirements
- Performance: [metrics]
- Security: [requirements]
- Scalability: [targets]
Handoff:¶
After approval, create docs/spec/.approved flag file.
#### .claude/agents/designer.md
```markdown
---
name: designer
description: System architect who creates technical designs from specifications
tools: [Read, Write, Grep]
prerequisites:
- docs/spec/.approved must exist
restrictions:
- Can only write to docs/design/*
- Cannot edit source code
- Must reference approved specifications
---
# Designer Agent Instructions
You transform approved specifications into technical architecture.
## Your Tasks:
1. Read approved `docs/spec/requirements.md`
2. Create `docs/design/architecture.md`
3. Define system components and interfaces
4. Create sequence/flow diagrams (mermaid)
## Output Format:
```markdown
# Technical Design
## Architecture Overview
[C4 Level 2 diagram]
## Components
- Component A: [responsibility]
- Interface: [API spec]
- Dependencies: [list]
## Data Flow
[Sequence diagram]
## Technology Stack
- Language: [choice + rationale]
- Framework: [choice + rationale]
- Database: [choice + rationale]
Handoff:¶
After review, create docs/design/.approved flag file.
## 🎮 GitHub Copilot: Instruction-Based Control
### 1. .github/copilot-instructions.md - Repository-Wide Rules
```markdown
# Repository-Wide Copilot Instructions
## 🚨 MANDATORY WORKFLOW (All developers & agents)
### Phase Gates
1. **SPEC** → 2. **DESIGN** → 3. **TASKS** → 4. **IMPLEMENTATION** → 5. **TEST/PR**
**Absolute Rule**: Never proceed to next phase without prior phase approval
### Approval Flags
- ✅ Spec Approved: `docs/spec/.approved`
- ✅ Design Approved: `docs/design/.approved`
- ✅ Tasks Defined: `docs/tasks/todo.yaml`
### Build & Test Commands (Required)
```bash
# Build
make build # or: npm run build, go build ./...
# Test
make test # or: npm test, go test ./...
# Lint
make lint # or: npm run lint, golangci-lint run
# Security
make security # or: npm audit, gosec ./...
Definition of Done¶
- All acceptance criteria met
- Unit tests added (coverage > 80%)
- Integration tests passing
- Security scan clean
- Documentation updated
- PR description links to spec & design
- Code review approved
🛡️ Security Rules¶
NEVER:¶
- Hardcode credentials or secrets
- Commit .env files
- Use eval() or exec() with user input
- Disable security features
- Skip input validation
ALWAYS:¶
- Use environment variables for config
- Validate and sanitize all inputs
- Use parameterized queries
- Enable security headers
- Log security events
📝 Code Style¶
General:¶
- Clear variable names (no abbreviations)
- Functions < 20 lines
- Files < 200 lines
- Cyclomatic complexity < 10
Language Specific:¶
- Python: PEP 8, Type hints required
- JavaScript/TypeScript: ESLint + Prettier
- Go: gofmt + golangci-lint
### 2. .github/instructions/ - Phase-Specific Instructions #### design.instructions.md ```markdown --- description: Design phase specific rules applyTo: "docs/design/**" --- # Design Phase Instructions ## Prerequisites Check ```bash # This must pass before you start: test -f docs/spec/.approved || echo "ERROR: Spec not approved!"
Design Document Structure¶
- Architecture Overview (C4 model)
- Component Specifications
- Interface Definitions (OpenAPI/GraphQL/Proto)
- Data Models (ER diagrams)
- Security Architecture
- Deployment Architecture
Required Diagrams (Mermaid)¶
- System Context (C4 Level 1)
- Container Diagram (C4 Level 2)
- Sequence Diagrams (key flows)
- State Machines (if applicable)
Design Review Checklist¶
- Addresses all requirements from spec
- Scalability considered
- Security threats analyzed (STRIDE)
- Cost estimation provided
- Technology choices justified
- Interfaces fully specified
Handoff¶
After approval, create: touch docs/design/.approved
#### implementation.instructions.md
```markdown
---
description: Implementation phase rules
applyTo: ["src/**", "lib/**", "pkg/**"]
---
# Implementation Phase Instructions
## Pre-Implementation Checklist
```bash
# All must exist:
test -f docs/spec/.approved || exit 1
test -f docs/design/.approved || exit 1
test -f docs/tasks/todo.yaml || exit 1
Task Tracking¶
Before implementing any feature: 1. Check task exists in docs/tasks/todo.yaml 2. Update status: todo → in-progress 3. Create feature branch: feature/TASK-ID-description
Code Requirements¶
Every File MUST Have:¶
- Header comment with purpose
- Unit tests in same directory (_test. file)
- Error handling (no silent failures)
- Logging (structured, with context)
Every Function MUST:¶
- Have JSDoc/docstring
- Validate inputs
- Handle edge cases
- Return errors (not throw, where applicable)
Testing Requirements¶
# Before ANY commit:
make test # Must pass
make lint # Must pass
make security # Must pass
# Coverage requirement:
# New code: >= 80%
# Modified code: >= 70%
PR Template Usage¶
## Changes
- [ ] Implements TASK-[ID] from todo.yaml
## Links
- Spec: [docs/spec/requirements.md](link)
- Design: [docs/design/architecture.md](link)
- Task: [TASK-ID in todo.yaml](link)
## Testing
- [ ] Unit tests added
- [ ] Integration tests updated
- [ ] Manual testing completed
## Checklist
- [ ] Code follows style guide
- [ ] Documentation updated
- [ ] No security issues
- [ ] Performance acceptable
### 3. .github/prompts/ - Reusable Prompts
#### 01-create-design.md
```markdown
---
mode: agent
description: Create technical design from approved spec
---
# Design Creation Task
## Input
Read the approved specification from `docs/spec/requirements.md`
## Process
1. Analyze functional requirements
2. Identify non-functional requirements
3. Propose architecture using C4 model
4. Define component interfaces
5. Specify data models
6. Plan deployment architecture
## Output
Create `docs/design/architecture.md` with:
- Architecture overview (C4 Level 1 & 2)
- Component specifications
- Interface definitions (OpenAPI/Proto)
- Data models (ER diagram)
- Deployment diagram
- Security considerations
## Constraints
- Do NOT edit any source code
- Reference specific requirements by ID
- Include mermaid diagrams
- Provide rationale for all decisions
## Completion
Request human review for `docs/design/.approved`
02-breakdown-tasks.md¶
---
mode: agent
description: Break down design into implementation tasks
---
# Task Breakdown
## Input
- Approved spec: `docs/spec/requirements.md`
- Approved design: `docs/design/architecture.md`
## Process
1. Identify all components from design
2. Break each component into tasks
3. Estimate effort (S/M/L/XL)
4. Define dependencies
5. Assign priorities (P0/P1/P2)
## Output Format (docs/tasks/todo.yaml)
```yaml
tasks:
- id: TASK-001
title: "Implement user authentication"
component: "auth-service"
effort: M
priority: P0
status: todo
dependencies: []
acceptance:
- "JWT tokens issued"
- "Refresh token flow works"
- "Rate limiting active"
- id: TASK-002
title: "Create user profile API"
component: "user-service"
effort: S
priority: P1
status: todo
dependencies: [TASK-001]
acceptance:
- "CRUD operations work"
- "Validation in place"
- "Tests pass"
Constraints¶
- Each task should be completable in 1-2 days
- Include clear acceptance criteria
- Define dependencies explicitly
### 4. Coding Agent Configuration #### .github/copilot/firewall.json ```json { "firewall": { "enabled": true, "rules": [ { "name": "Block all by default", "action": "block", "pattern": "*" }, { "name": "Allow npm registry", "action": "allow", "pattern": "registry.npmjs.org" }, { "name": "Allow company registry", "action": "allow", "pattern": "registry.company.com" }, { "name": "Allow GitHub", "action": "allow", "pattern": "github.com" }, { "name": "Allow documentation sites", "action": "allow", "pattern": "*.readthedocs.io" } ], "blocked_commands": [ "curl", "wget", "nc", "telnet" ] } }
📊 Implementation Patterns and Best Practices¶
Pattern 1: Staged Approval Flow¶
sequenceDiagram
participant Dev as Developer
participant AI as AI Agent
participant Gate as Gate Keeper
participant Review as Reviewer
Dev->>AI: Start new feature
AI->>Gate: Check phase gates
Gate-->>AI: ❌ Spec not approved
AI->>Dev: Create spec first
Dev->>AI: Write specification
AI->>Dev: docs/spec/requirements.md created
Review->>Review: Review spec
Review->>Gate: Approve (create .approved)
Dev->>AI: Create design
AI->>Gate: Check phase gates
Gate-->>AI: ✅ Spec approved
AI->>Dev: docs/design/architecture.md createdPattern 2: Automatic Validation Loop¶
# .claude/hooks/auto_validator.py
import subprocess
import json
def validate_implementation(file_path):
"""Automatically validate implementation against spec"""
validations = []
# 1. Check if tests exist
test_file = file_path.replace('.py', '_test.py')
if not os.path.exists(test_file):
validations.append({
'level': 'ERROR',
'message': f'No test file found for {file_path}'
})
# 2. Run linter
result = subprocess.run(['pylint', file_path], capture_output=True)
if result.returncode != 0:
validations.append({
'level': 'WARNING',
'message': 'Linting issues found'
})
# 3. Check coverage
result = subprocess.run(
['coverage', 'run', '-m', 'pytest', test_file],
capture_output=True
)
# Parse coverage and check threshold
return validations
Pattern 3: Agent Collaboration¶
# .claude/workflows/feature_development.yaml
workflow:
name: "Feature Development"
stages:
- stage: specification
agent: specifier
outputs:
- docs/spec/requirements.md
approval_required: true
- stage: design
agent: designer
inputs:
- docs/spec/requirements.md
outputs:
- docs/design/architecture.md
approval_required: true
- stage: tasking
agent: tasker
inputs:
- docs/spec/requirements.md
- docs/design/architecture.md
outputs:
- docs/tasks/todo.yaml
approval_required: false
- stage: implementation
agent: implementer
inputs:
- docs/tasks/todo.yaml
outputs:
- src/**
- tests/**
validation:
- make test
- make lint
🚀 Implementation Procedure (Sprint 1 Checklist)¶
Week 1: Foundation Setup¶
Create directory structure
mkdir -p docs/{spec,design,tasks} mkdir -p .claude/{hooks,agents} mkdir -p .github/{instructions,prompts,copilot}Configure Claude Code
# Create CLAUDE.md cat > CLAUDE.md << 'EOF' [CLAUDE.md content from above] EOF # Place settings.json cat > .claude/settings.json << 'EOF' [settings.json content from above] EOF # Setup Hooks python3 -m pip install pyyaml [Place gatekeeper.py]Configure GitHub Copilot
# Place Instructions [Create copilot-instructions.md] [Create instructions/*.instructions.md] [Create prompts/*]
Week 2: Process Establishment¶
- Validate flow with first feature
- Define requirements with Specifier agent
- Execute approval process
- Create design with Designer agent
- Break down tasks
Implement with gates
Start metrics collection
- Gate violation count
- Time to approval
- Implementation quality score
Week 3: Improvement and Scale¶
- Collect feedback
- Adjust Hooks/Instructions
- Roll out to entire team
📈 Effectiveness Measurement¶
Before/After Comparison Metrics¶
| Metric | Before | After (Target) | Measurement Method |
|---|---|---|---|
| Implementation without spec rate | 73% | < 5% | Git log analysis |
| Design review execution rate | 31% | 100% | Approval flags |
| Test coverage | 42% | > 80% | Coverage.py |
| Production incident rate | 8.2% | < 2% | Incident management |
| Rework effort | 35% | < 10% | Task tracking |
ROI Calculation Example¶
# Annual cost reduction calculation
def calculate_roi():
# Assumptions
team_size = 10
avg_salary = 80_000 # USD/year
# Improvement effects
rework_reduction = 0.25 # 25% rework reduction
incident_reduction = 0.06 # 6% incident reduction
# Cost savings
rework_savings = team_size * avg_salary * rework_reduction
incident_savings = 50_000 * incident_reduction # $50k per incident
# Implementation cost
setup_cost = 20_000 # Initial setup
training_cost = 10_000 # Training
# ROI
annual_savings = rework_savings + incident_savings
roi = (annual_savings - (setup_cost + training_cost)) / (setup_cost + training_cost)
return {
'annual_savings': annual_savings,
'roi_percentage': roi * 100,
'payback_months': (setup_cost + training_cost) / (annual_savings / 12)
}
🎓 Troubleshooting¶
Common Issues and Solutions¶
Issue 1: Hooks Not Working¶
# Check permissions
chmod +x .claude/hooks/*.py
# Check Python environment
python3 --version
# Verify Hook configuration
cat .claude/hooks/config.json
# Debug mode
export CLAUDE_DEBUG=1
Issue 2: Copilot Ignoring Instructions¶
# Add to top of .github/copilot-instructions.md
## ⚠️ CRITICAL RULES - MUST FOLLOW
These rules override all other considerations:
1. NEVER skip phase gates
2. ALWAYS check approval flags
3. MUST run tests before commit
Issue 3: Approval Flag Management¶
# Create approval script
cat > approve.sh << 'EOF'
#!/bin/bash
case $1 in
spec)
touch docs/spec/.approved
echo "✅ Spec approved"
;;
design)
touch docs/design/.approved
echo "✅ Design approved"
;;
*)
echo "Usage: ./approve.sh [spec|design]"
;;
esac
EOF
chmod +x approve.sh
🔮 Future Outlook¶
Next-Generation Features¶
- AI Auditor: Automate part of the approval process
- Quality Prediction: Predict quality issues before implementation
- Auto Refactoring: Automatically suggest design improvements
- Multi-Agent Collaboration: Automatic coordination of multiple AIs
Planned Tool Integrations¶
- Amazon Kiro: Native integration
- MCP (Model Context Protocol): External system integration
- GitHub Issues: Automatic task generation
- Jira/Linear: Task synchronization
📚 Summary¶
Core Value¶
✅ From "recommended" to "enforced" spec-driven development ✅ Quality assurance through technical guardrails ✅ Complete control of agent behavior ✅ Auditable development process
Implementation Priority¶
- Essential: CLAUDE.md + settings.json (Claude) / copilot-instructions.md (Copilot)
- Recommended: Hooks/Gates + Instructions files
- Optional: Subagents + Prompts + Firewall
Next Action¶
# Quick start
git clone https://github.com/your-org/spec-first-template
cd spec-first-template
./setup.sh
Key to Success
Start with a small project first, observe team reactions, and gradually roll out in stages—that's the path to success.
🔗 Related Resources¶
- Claude Code Official Documentation
- GitHub Copilot Instructions
- Our Spec-Based Development Article
- Amazon Kiro Complete Guide
Tags: #SpecFirst #ClaudeCode #GitHubCopilot #Hooks #Instructions #Gatekeeper #QualityGates #AgenticCoding #DevOps #SDLC