Skip to content

Practical Guide to Enforcing Spec-Based Development with AI Agents: Claude Code & GitHub Copilot Complete Setup [2025 Edition]

Key Points

  • Technically enforce the spec → design → implementation flow to prevent vibe coding chaos
  • Set gate conditions in Claude Code and GitHub Copilot to block implementation without approval
  • Complete development process control through instruction files, permissions, and hooks
  • Quality assurance through agent role separation and permission restrictions

Important: Why "Enforcement" is Necessary

AI agents tend to treat instructions as "recommendations." Without technical enforcement mechanisms, you'll inevitably revert to vibe coding. This article provides concrete techniques to block implementation.

🎯 Overall Design: Spec-Based Development Enforcement Architecture

Phase Gate Approach

graph LR
    A[📋 SPEC] -->|Approval Flag| B[🏗️ DESIGN]
    B -->|Approval Flag| C[📝 TASKS]
    C -->|Approval Flag| D[💻 IMPLEMENTATION]
    D -->|Auto Validation| E[✅ TEST/PR]

    A -.->|❌ Blocked| D
    B -.->|❌ Blocked| D

    style A fill:#ffe0e0
    style B fill:#fff0e0
    style C fill:#e0f0ff
    style D fill:#e0ffe0
    style E fill:#f0e0ff

Enforcement Mechanism Comparison

ToolEnforcement LevelMechanismEffect
Claude Code⭐⭐⭐⭐⭐Hooks + PermissionsComplete blocking possible
GitHub Copilot⭐⭐⭐⭐Instructions + FirewallStrong guidance + restrictions
Traditional Instructions⭐⭐README/CommentsEasily ignored

🔒 Claude Code: Complete Control Implementation

1. CLAUDE.md - Memory and Behavioral Rules

# Project Operating Rules (Spec-first ENFORCED)

## ⚠️ CRITICAL: Phase Gates

### Phase 1: SPECIFICATION
- Location: `docs/spec/requirements.md`
- Approval Flag: `docs/spec/.approved`
- **BLOCKED ACTIONS**: Edit, Write, Bash (except spec-related)

### Phase 2: DESIGN
- **PREREQUISITE**: `docs/spec/.approved` MUST exist
- Location: `docs/design/architecture.md`
- Approval Flag: `docs/design/.approved`
- **BLOCKED ACTIONS**: Code editing, test execution

### Phase 3: TASKING
- **PREREQUISITE**: `docs/design/.approved` MUST exist
- Location: `docs/tasks/todo.yaml`
- Format: YAML with status field (todo/in-progress/done)

### Phase 4: IMPLEMENTATION
- **PREREQUISITE**: All above flags MUST exist
- **NOW ALLOWED**: Edit, Write source files
- **STILL BLOCKED**: Direct deployment, production access

## 🛡️ Enforcement Rules

```yaml
enforcement:
  spec_not_approved:
    message: "❌ SPEC not approved. Cannot proceed to design."
    allowed_tools: [Read, Grep]
    blocked_tools: [Edit, Write, Bash, TodoWrite]

  design_not_approved:
    message: "❌ DESIGN not approved. Cannot implement."
    allowed_tools: [Read, Grep, TodoWrite]
    blocked_tools: [Edit, Write, Bash]

📋 Definition of Done

  1. ✅ Spec approved by stakeholder
  2. ✅ Design reviewed and approved
  3. ✅ All tasks completed in todo.yaml
  4. ✅ Unit tests written and passing
  5. ✅ Integration tests passing
  6. ✅ Security scan clean
  7. ✅ Documentation updated
  8. ✅ PR links to spec & design docs
    ### 2. .claude/settings.json - Permission Control
    
    ```json
    {
      "permissions": {
        "deny": [
          "Read(./.env*)",
          "Read(./secrets/**)",
          "Read(./credentials/**)",
          "Bash(rm -rf *)",
          "Bash(curl *)",
          "Bash(wget *)"
        ],
        "ask": [
          "Edit(**/*.py)",
          "Edit(**/*.js)",
          "Edit(**/*.ts)",
          "Write(**/*.py)",
          "Write(**/*.js)",
          "Write(**/*.ts)",
          "Bash(npm install *)",
          "Bash(pip install *)",
          "Bash(go get *)"
        ],
        "allow": [
          "Read(docs/**)",
          "Read(README.md)",
          "Grep(**)",
          "LS(**)"
        ]
      },
      "tools": {
        "disabled_by_default": ["NotebookEdit", "WebFetch"],
        "require_confirmation": ["MultiEdit", "Write"]
      }
    }
    

3. Hooks - Gatekeeper Implementation

.claude/hooks/config.json

{
  "hooks": {
    "PreToolUse": [
      {
        "name": "Spec Gate Keeper",
        "matcher": "Edit|Write|MultiEdit",
        "hooks": [
          {
            "type": "command",
            "command": "python3 .claude/hooks/gatekeeper.py",
            "fail_on_error": true
          }
        ]
      }
    ],
    "PostToolUse": [
      {
        "name": "Auto Format & Test",
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "python3 .claude/hooks/auto_format.py"
          }
        ]
      }
    ]
  }
}

.claude/hooks/gatekeeper.py

#!/usr/bin/env python3
"""
Phase Gate Keeper - Enforces spec-first development
"""
import os
import sys
import json

def check_phase_gates():
    """Check if current phase allows the requested action"""

    # Phase 1: Check SPEC approval
    if not os.path.exists('docs/spec/.approved'):
        print("❌ BLOCKED: Specification not approved")
        print("Required: Review and approve docs/spec/requirements.md first")
        print("Run: touch docs/spec/.approved (after human review)")
        return False

    # Phase 2: Check DESIGN approval
    if not os.path.exists('docs/design/.approved'):
        # Allow spec and design editing only
        if any(arg in sys.argv for arg in ['docs/spec', 'docs/design']):
            return True
        print("❌ BLOCKED: Design not approved")
        print("Required: Review and approve docs/design/architecture.md")
        print("Run: touch docs/design/.approved (after human review)")
        return False

    # Phase 3: Check task planning
    if not os.path.exists('docs/tasks/todo.yaml'):
        print("⚠️ WARNING: No task breakdown found")
        print("Recommended: Create docs/tasks/todo.yaml first")
        # Warning only, not blocking

    # Phase 4: Implementation allowed
    print("✅ All gates passed - implementation allowed")
    return True

def main():
    # Get the tool being used from environment or args
    tool = os.environ.get('CLAUDE_TOOL', '')

    # Read-only operations always allowed
    if tool in ['Read', 'Grep', 'LS', 'BashOutput']:
        sys.exit(0)

    # Check phase gates for write operations
    if not check_phase_gates():
        sys.exit(1)  # Block the operation

    sys.exit(0)  # Allow the operation

if __name__ == '__main__':
    main()

4. Subagents - Role Separation

.claude/agents/specifier.md

---
name: specifier
description: Requirements analyst who creates and refines specifications
tools: [Read, Write, Grep, TodoWrite]
restrictions:
  - Can only write to docs/spec/*
  - Cannot edit source code
  - Must produce requirements.md with clear acceptance criteria
---

# Specifier Agent Instructions

You are responsible for the SPECIFICATION phase only.

## Your Tasks:
1. Interview stakeholders (via user interaction)
2. Document requirements in `docs/spec/requirements.md`
3. Define clear acceptance criteria
4. Create user stories with testable outcomes

## Output Format:
```markdown
# Requirements Specification

## User Stories
- As a [user type]
- I want [functionality]
- So that [business value]

## Acceptance Criteria
- [ ] Criterion 1 (testable)
- [ ] Criterion 2 (measurable)
- [ ] Criterion 3 (observable)

## Non-Functional Requirements
- Performance: [metrics]
- Security: [requirements]
- Scalability: [targets]

Handoff:

After approval, create docs/spec/.approved flag file.

#### .claude/agents/designer.md

```markdown
---
name: designer
description: System architect who creates technical designs from specifications
tools: [Read, Write, Grep]
prerequisites:
  - docs/spec/.approved must exist
restrictions:
  - Can only write to docs/design/*
  - Cannot edit source code
  - Must reference approved specifications
---

# Designer Agent Instructions

You transform approved specifications into technical architecture.

## Your Tasks:
1. Read approved `docs/spec/requirements.md`
2. Create `docs/design/architecture.md`
3. Define system components and interfaces
4. Create sequence/flow diagrams (mermaid)

## Output Format:
```markdown
# Technical Design

## Architecture Overview
[C4 Level 2 diagram]

## Components
- Component A: [responsibility]
  - Interface: [API spec]
  - Dependencies: [list]

## Data Flow
[Sequence diagram]

## Technology Stack
- Language: [choice + rationale]
- Framework: [choice + rationale]
- Database: [choice + rationale]

Handoff:

After review, create docs/design/.approved flag file.

## 🎮 GitHub Copilot: Instruction-Based Control

### 1. .github/copilot-instructions.md - Repository-Wide Rules

```markdown
# Repository-Wide Copilot Instructions

## 🚨 MANDATORY WORKFLOW (All developers & agents)

### Phase Gates
1. **SPEC** → 2. **DESIGN** → 3. **TASKS** → 4. **IMPLEMENTATION** → 5. **TEST/PR**

**Absolute Rule**: Never proceed to next phase without prior phase approval

### Approval Flags
- ✅ Spec Approved: `docs/spec/.approved`
- ✅ Design Approved: `docs/design/.approved`
- ✅ Tasks Defined: `docs/tasks/todo.yaml`

### Build & Test Commands (Required)
```bash
# Build
make build    # or: npm run build, go build ./...

# Test  
make test     # or: npm test, go test ./...

# Lint
make lint     # or: npm run lint, golangci-lint run

# Security
make security # or: npm audit, gosec ./...

Definition of Done

  • All acceptance criteria met
  • Unit tests added (coverage > 80%)
  • Integration tests passing
  • Security scan clean
  • Documentation updated
  • PR description links to spec & design
  • Code review approved

🛡️ Security Rules

NEVER:

  • Hardcode credentials or secrets
  • Commit .env files
  • Use eval() or exec() with user input
  • Disable security features
  • Skip input validation

ALWAYS:

  • Use environment variables for config
  • Validate and sanitize all inputs
  • Use parameterized queries
  • Enable security headers
  • Log security events

📝 Code Style

General:

  • Clear variable names (no abbreviations)
  • Functions < 20 lines
  • Files < 200 lines
  • Cyclomatic complexity < 10

Language Specific:

  • Python: PEP 8, Type hints required
  • JavaScript/TypeScript: ESLint + Prettier
  • Go: gofmt + golangci-lint
    ### 2. .github/instructions/ - Phase-Specific Instructions
    
    #### design.instructions.md
    
    ```markdown
    ---
    description: Design phase specific rules
    applyTo: "docs/design/**"
    ---
    
    # Design Phase Instructions
    
    ## Prerequisites Check
    ```bash
    # This must pass before you start:
    test -f docs/spec/.approved || echo "ERROR: Spec not approved!"
    

Design Document Structure

  1. Architecture Overview (C4 model)
  2. Component Specifications
  3. Interface Definitions (OpenAPI/GraphQL/Proto)
  4. Data Models (ER diagrams)
  5. Security Architecture
  6. Deployment Architecture

Required Diagrams (Mermaid)

  • System Context (C4 Level 1)
  • Container Diagram (C4 Level 2)
  • Sequence Diagrams (key flows)
  • State Machines (if applicable)

Design Review Checklist

  • Addresses all requirements from spec
  • Scalability considered
  • Security threats analyzed (STRIDE)
  • Cost estimation provided
  • Technology choices justified
  • Interfaces fully specified

Handoff

After approval, create: touch docs/design/.approved

#### implementation.instructions.md

```markdown
---
description: Implementation phase rules
applyTo: ["src/**", "lib/**", "pkg/**"]
---

# Implementation Phase Instructions

## Pre-Implementation Checklist
```bash
# All must exist:
test -f docs/spec/.approved || exit 1
test -f docs/design/.approved || exit 1
test -f docs/tasks/todo.yaml || exit 1

Task Tracking

Before implementing any feature: 1. Check task exists in docs/tasks/todo.yaml 2. Update status: todoin-progress 3. Create feature branch: feature/TASK-ID-description

Code Requirements

Every File MUST Have:

  • Header comment with purpose
  • Unit tests in same directory (_test. file)
  • Error handling (no silent failures)
  • Logging (structured, with context)

Every Function MUST:

  • Have JSDoc/docstring
  • Validate inputs
  • Handle edge cases
  • Return errors (not throw, where applicable)

Testing Requirements

# Before ANY commit:
make test        # Must pass
make lint        # Must pass
make security    # Must pass

# Coverage requirement:
# New code: >= 80%
# Modified code: >= 70%

PR Template Usage

## Changes
- [ ] Implements TASK-[ID] from todo.yaml

## Links
- Spec: [docs/spec/requirements.md](link)
- Design: [docs/design/architecture.md](link)
- Task: [TASK-ID in todo.yaml](link)

## Testing
- [ ] Unit tests added
- [ ] Integration tests updated
- [ ] Manual testing completed

## Checklist
- [ ] Code follows style guide
- [ ] Documentation updated
- [ ] No security issues
- [ ] Performance acceptable
### 3. .github/prompts/ - Reusable Prompts

#### 01-create-design.md

```markdown
---
mode: agent
description: Create technical design from approved spec
---

# Design Creation Task

## Input
Read the approved specification from `docs/spec/requirements.md`

## Process
1. Analyze functional requirements
2. Identify non-functional requirements  
3. Propose architecture using C4 model
4. Define component interfaces
5. Specify data models
6. Plan deployment architecture

## Output
Create `docs/design/architecture.md` with:
- Architecture overview (C4 Level 1 & 2)
- Component specifications
- Interface definitions (OpenAPI/Proto)
- Data models (ER diagram)
- Deployment diagram
- Security considerations

## Constraints
- Do NOT edit any source code
- Reference specific requirements by ID
- Include mermaid diagrams
- Provide rationale for all decisions

## Completion
Request human review for `docs/design/.approved`

02-breakdown-tasks.md

---
mode: agent
description: Break down design into implementation tasks
---

# Task Breakdown

## Input
- Approved spec: `docs/spec/requirements.md`
- Approved design: `docs/design/architecture.md`

## Process
1. Identify all components from design
2. Break each component into tasks
3. Estimate effort (S/M/L/XL)
4. Define dependencies
5. Assign priorities (P0/P1/P2)

## Output Format (docs/tasks/todo.yaml)
```yaml
tasks:
  - id: TASK-001
    title: "Implement user authentication"
    component: "auth-service"
    effort: M
    priority: P0
    status: todo
    dependencies: []
    acceptance:
      - "JWT tokens issued"
      - "Refresh token flow works"
      - "Rate limiting active"

  - id: TASK-002
    title: "Create user profile API"
    component: "user-service"
    effort: S
    priority: P1
    status: todo
    dependencies: [TASK-001]
    acceptance:
      - "CRUD operations work"
      - "Validation in place"
      - "Tests pass"

Constraints

  • Each task should be completable in 1-2 days
  • Include clear acceptance criteria
  • Define dependencies explicitly
    ### 4. Coding Agent Configuration
    
    #### .github/copilot/firewall.json
    
    ```json
    {
      "firewall": {
        "enabled": true,
        "rules": [
          {
            "name": "Block all by default",
            "action": "block",
            "pattern": "*"
          },
          {
            "name": "Allow npm registry",
            "action": "allow",
            "pattern": "registry.npmjs.org"
          },
          {
            "name": "Allow company registry",
            "action": "allow",
            "pattern": "registry.company.com"
          },
          {
            "name": "Allow GitHub",
            "action": "allow",
            "pattern": "github.com"
          },
          {
            "name": "Allow documentation sites",
            "action": "allow",
            "pattern": "*.readthedocs.io"
          }
        ],
        "blocked_commands": [
          "curl",
          "wget",
          "nc",
          "telnet"
        ]
      }
    }
    

📊 Implementation Patterns and Best Practices

Pattern 1: Staged Approval Flow

sequenceDiagram
    participant Dev as Developer
    participant AI as AI Agent
    participant Gate as Gate Keeper
    participant Review as Reviewer

    Dev->>AI: Start new feature
    AI->>Gate: Check phase gates
    Gate-->>AI: ❌ Spec not approved
    AI->>Dev: Create spec first

    Dev->>AI: Write specification
    AI->>Dev: docs/spec/requirements.md created

    Review->>Review: Review spec
    Review->>Gate: Approve (create .approved)

    Dev->>AI: Create design
    AI->>Gate: Check phase gates
    Gate-->>AI: ✅ Spec approved
    AI->>Dev: docs/design/architecture.md created

Pattern 2: Automatic Validation Loop

# .claude/hooks/auto_validator.py
import subprocess
import json

def validate_implementation(file_path):
    """Automatically validate implementation against spec"""

    validations = []

    # 1. Check if tests exist
    test_file = file_path.replace('.py', '_test.py')
    if not os.path.exists(test_file):
        validations.append({
            'level': 'ERROR',
            'message': f'No test file found for {file_path}'
        })

    # 2. Run linter
    result = subprocess.run(['pylint', file_path], capture_output=True)
    if result.returncode != 0:
        validations.append({
            'level': 'WARNING',
            'message': 'Linting issues found'
        })

    # 3. Check coverage
    result = subprocess.run(
        ['coverage', 'run', '-m', 'pytest', test_file],
        capture_output=True
    )
    # Parse coverage and check threshold

    return validations

Pattern 3: Agent Collaboration

# .claude/workflows/feature_development.yaml
workflow:
  name: "Feature Development"
  stages:
    - stage: specification
      agent: specifier
      outputs:
        - docs/spec/requirements.md
      approval_required: true

    - stage: design
      agent: designer
      inputs:
        - docs/spec/requirements.md
      outputs:
        - docs/design/architecture.md
      approval_required: true

    - stage: tasking
      agent: tasker
      inputs:
        - docs/spec/requirements.md
        - docs/design/architecture.md
      outputs:
        - docs/tasks/todo.yaml
      approval_required: false

    - stage: implementation
      agent: implementer
      inputs:
        - docs/tasks/todo.yaml
      outputs:
        - src/**
        - tests/**
      validation:
        - make test
        - make lint

🚀 Implementation Procedure (Sprint 1 Checklist)

Week 1: Foundation Setup

  • Create directory structure

    mkdir -p docs/{spec,design,tasks}
    mkdir -p .claude/{hooks,agents}
    mkdir -p .github/{instructions,prompts,copilot}
    

  • Configure Claude Code

    # Create CLAUDE.md
    cat > CLAUDE.md << 'EOF'
    [CLAUDE.md content from above]
    EOF
    
    # Place settings.json
    cat > .claude/settings.json << 'EOF'
    [settings.json content from above]
    EOF
    
    # Setup Hooks
    python3 -m pip install pyyaml
    [Place gatekeeper.py]
    

  • Configure GitHub Copilot

    # Place Instructions
    [Create copilot-instructions.md]
    [Create instructions/*.instructions.md]
    [Create prompts/*]
    

Week 2: Process Establishment

  • Validate flow with first feature
  • Define requirements with Specifier agent
  • Execute approval process
  • Create design with Designer agent
  • Break down tasks
  • Implement with gates

  • Start metrics collection

  • Gate violation count
  • Time to approval
  • Implementation quality score

Week 3: Improvement and Scale

  • Collect feedback
  • Adjust Hooks/Instructions
  • Roll out to entire team

📈 Effectiveness Measurement

Before/After Comparison Metrics

MetricBeforeAfter (Target)Measurement Method
Implementation without spec rate73%< 5%Git log analysis
Design review execution rate31%100%Approval flags
Test coverage42%> 80%Coverage.py
Production incident rate8.2%< 2%Incident management
Rework effort35%< 10%Task tracking

ROI Calculation Example

# Annual cost reduction calculation
def calculate_roi():
    # Assumptions
    team_size = 10
    avg_salary = 80_000  # USD/year

    # Improvement effects
    rework_reduction = 0.25  # 25% rework reduction
    incident_reduction = 0.06  # 6% incident reduction

    # Cost savings
    rework_savings = team_size * avg_salary * rework_reduction
    incident_savings = 50_000 * incident_reduction  # $50k per incident

    # Implementation cost
    setup_cost = 20_000  # Initial setup
    training_cost = 10_000  # Training

    # ROI
    annual_savings = rework_savings + incident_savings
    roi = (annual_savings - (setup_cost + training_cost)) / (setup_cost + training_cost)

    return {
        'annual_savings': annual_savings,
        'roi_percentage': roi * 100,
        'payback_months': (setup_cost + training_cost) / (annual_savings / 12)
    }

🎓 Troubleshooting

Common Issues and Solutions

Issue 1: Hooks Not Working

# Check permissions
chmod +x .claude/hooks/*.py

# Check Python environment
python3 --version

# Verify Hook configuration
cat .claude/hooks/config.json

# Debug mode
export CLAUDE_DEBUG=1

Issue 2: Copilot Ignoring Instructions

# Add to top of .github/copilot-instructions.md
## ⚠️ CRITICAL RULES - MUST FOLLOW

These rules override all other considerations:
1. NEVER skip phase gates
2. ALWAYS check approval flags
3. MUST run tests before commit

Issue 3: Approval Flag Management

# Create approval script
cat > approve.sh << 'EOF'
#!/bin/bash
case $1 in
  spec)
    touch docs/spec/.approved
    echo "✅ Spec approved"
    ;;
  design)
    touch docs/design/.approved
    echo "✅ Design approved"
    ;;
  *)
    echo "Usage: ./approve.sh [spec|design]"
    ;;
esac
EOF
chmod +x approve.sh

🔮 Future Outlook

Next-Generation Features

  1. AI Auditor: Automate part of the approval process
  2. Quality Prediction: Predict quality issues before implementation
  3. Auto Refactoring: Automatically suggest design improvements
  4. Multi-Agent Collaboration: Automatic coordination of multiple AIs

Planned Tool Integrations

  • Amazon Kiro: Native integration
  • MCP (Model Context Protocol): External system integration
  • GitHub Issues: Automatic task generation
  • Jira/Linear: Task synchronization

📚 Summary

Core Value

From "recommended" to "enforced" spec-driven developmentQuality assurance through technical guardrailsComplete control of agent behaviorAuditable development process

Implementation Priority

  1. Essential: CLAUDE.md + settings.json (Claude) / copilot-instructions.md (Copilot)
  2. Recommended: Hooks/Gates + Instructions files
  3. Optional: Subagents + Prompts + Firewall

Next Action

# Quick start
git clone https://github.com/your-org/spec-first-template
cd spec-first-template
./setup.sh

Key to Success

Start with a small project first, observe team reactions, and gradually roll out in stages—that's the path to success.


Tags: #SpecFirst #ClaudeCode #GitHubCopilot #Hooks #Instructions #Gatekeeper #QualityGates #AgenticCoding #DevOps #SDLC