Evaluation
Hallucination
When an LLM generates confident, fluent text that is factually incorrect or entirely fabricated.
LLMs don't retrieve facts from a database - they generate text based on learned statistical patterns. Sometimes those patterns produce false statements with the same confident fluency as true ones. The model doesn't "know" it's wrong; it has no truth signal, only token prediction probabilities.
Types of hallucination
- Factual hallucination: Inventing specific facts (dates, statistics, citations) that don't exist.
- Citation fabrication: Generating plausible-sounding paper titles, DOIs, or URLs that don't exist.
- Instruction hallucination: Claiming to have performed an action (reading a file, executing code) that wasn't actually done.
- Temporal confusion: Stating present-tense facts based on stale training data.
Why models hallucinate
Training maximizes token prediction accuracy across a diverse corpus. When the model is asked about something rare or absent from training data, it generates statistically plausible text rather than saying "I don't know." The model has no mechanism to distinguish "I'm generating based on solid evidence" from "I'm filling in based on pattern matching."
Mitigating hallucinations
- RAG: Ground responses in retrieved documents the model can actually cite.
- Tool use: Let the model call a search API or database rather than generate facts from memory.
- Constrained output: Ask for JSON with required fields; harder to hallucinate structural content.
- Self-consistency: Generate multiple answers and pick the most common one.
- Calibrated uncertainty: Prompt the model to express confidence levels and refuse low-confidence claims.