LLM Debugging / Diagnostics¶
Major Failure Types¶
| Type | Symptom | Representative Metric | Initial Question |
|---|---|---|---|
| Output Deviation | Structure Broken | JSON Parse Failure Rate | Schema Up-to-Date? |
| Inference Error | Logic Breakdown | Evaluation Set Error Rate | Insufficient Chain-of-Thought? |
| Context Loss | Required Info Not Referenced | Hit Rate | Top-k Search Appropriate? |
| Guard Failure | Prohibited Generation | Filter Log | Rule Explicitness? |
Observability Stack¶
- Prompt diff preservation (with hash)
- Input/output token statistics
- Evaluation sample re-execution
- Post-hoc analysis (clustering)
Back to: index.md