Skip to content

LLM Debugging / Diagnostics

Major Failure Types

TypeSymptomRepresentative MetricInitial Question
Output DeviationStructure BrokenJSON Parse Failure RateSchema Up-to-Date?
Inference ErrorLogic BreakdownEvaluation Set Error RateInsufficient Chain-of-Thought?
Context LossRequired Info Not ReferencedHit RateTop-k Search Appropriate?
Guard FailureProhibited GenerationFilter LogRule Explicitness?

Observability Stack

  1. Prompt diff preservation (with hash)
  2. Input/output token statistics
  3. Evaluation sample re-execution
  4. Post-hoc analysis (clustering)

Back to: index.md