Chain-of-Thought (CoT)
A prompting technique where the model is asked to reason step-by-step before giving a final answer, dramatically improving accuracy on complex tasks.
Chain-of-thought (CoT) prompting instructs a model to "think out loud" - producing a sequence of reasoning steps before committing to a final answer. This simple change dramatically improves performance on tasks requiring multi-step reasoning: math problems, logical puzzles, code generation, and analysis.
How it works
The standard approach is adding "Let's think step by step" to the prompt, or demonstrating reasoning examples via few-shot examples. The model produces intermediate steps that build toward the answer, rather than jumping directly to a conclusion.
Why it helps
LLMs are next-token predictors. Without explicit reasoning tokens, the model must compress all its "thinking" into the single probability distribution that predicts the answer token. With chain-of-thought, the model has actual working memory (the generated tokens) to carry state across reasoning steps. This is especially valuable for arithmetic, multi-hop questions, and anything requiring tracking multiple constraints.
Zero-shot vs few-shot CoT
- Zero-shot CoT: "Think step by step." No examples needed. Works well on modern frontier models.
- Few-shot CoT: Provide 3-8 solved examples showing the reasoning process. Higher quality but more tokens and prompting effort.
CoT and o1-style models
OpenAI's o1 and similar "reasoning" models (DeepSeek R1, Claude Sonnet extended thinking) apply chain-of-thought internally at inference time. The reasoning tokens are generated but often hidden from the user. This approach consistently tops benchmarks on hard reasoning tasks - essentially scaling test-time compute to improve quality.
Related terms
Models relevant to Chain-of-Thought (CoT)
o1
OpenAI's reasoning model that thinks before it answers - best for hard science and math
View model →Claude Opus 4.7
Anthropic's most capable model for tasks that demand deep reasoning and precision
View model →Gemini 2.5 Pro
Google's most capable model with a 1M token context and top science benchmark scores
View model →