For Developers/Glossary/Top-p (Nucleus Sampling)
Inference

Top-p (Nucleus Sampling)

A sampling strategy that dynamically limits token selection to the smallest set of tokens whose cumulative probability exceeds a threshold p.

Top-p sampling (also called nucleus sampling) is an alternative to top-k sampling. Instead of keeping a fixed number of candidates (top-k), top-p keeps only the most probable tokens whose combined probability adds up to at least p. This adapts dynamically to the model's uncertainty: when the model is confident, only 2-3 tokens might pass the threshold; when uncertain, dozens might.

Top-p vs top-k vs temperature

These three parameters all influence token selection:

  • Temperature reshapes the probability distribution
  • Top-k keeps the k most likely tokens
  • Top-p keeps tokens until their cumulative probability reaches p

Most production systems use temperature + top-p together. Top-k is less common in modern APIs.

Common top-p values

  • 0.1: Very conservative, sticks closely to the most likely completions
  • 0.9 - 0.95: Standard default in most APIs. Allows some diversity without wild outputs.
  • 1.0: No truncation - considers all tokens.

When it matters

For most API use cases, the default top-p (usually 0.9-0.95) is fine. Tuning top-p rarely makes a meaningful difference on its own - if output quality is the issue, look first at the prompt and temperature settings before adjusting top-p.

Related terms