Temperature

When an LLM generates the next token, it produces a probability distribution over its entire vocabulary. Temperature scales this distribution before sampling: low temperatures sharpen the distribution (the most likely token becomes overwhelmingly favored), while high temperatures flatten it (unlikely tokens get more chances to appear).

Temperature values in practice

0: Greedy decoding - always pick the highest-probability token. Deterministic output. Best for factual extraction, code generation, and structured data.
0.1 to 0.3: Near-deterministic with small variation between runs. Good for most production tasks.
0.7 to 1.0: Noticeable variation. Good for creative writing, brainstorming, and diverse generation.
Above 1.0: Very high randomness. Output quality degrades quickly. Rarely useful in practice.

Temperature and token sampling

Temperature works together with top-p (nucleus) sampling and top-k sampling. A common production setting is temperature=0.2 with top-p=0.95 for general tasks, providing slight output variation while avoiding degenerate low-probability tokens. Some models also support min-p sampling as an alternative constraint.

Temperature does not determine quality

A common misconception is that higher temperature equals worse quality and lower temperature equals better quality. The correct temperature depends entirely on the task. Code output at temperature=1.0 is unreliable; creative writing at temperature=0 is robotic. Match temperature to your specific use case.

Temperature values in practice

Temperature and token sampling

Temperature does not determine quality

Related terms