Top-p (Nucleus Sampling)

Top-p sampling, also called nucleus sampling, is an alternative to top-k sampling. Instead of keeping a fixed number of candidates (top-k), top-p includes only the most probable tokens whose combined probability reaches at least p. This adapts dynamically to the model's confidence: when the model is confident, only 2-3 tokens might qualify; when uncertain, dozens might.

Top-p vs top-k vs temperature

These three parameters influence token selection differently:

Temperature reshapes the probability distribution before sampling
Top-k restricts selection to the k most likely tokens
Top-p restricts selection to tokens whose cumulative probability reaches p

Most production systems combine temperature with top-p. Top-k is less common in modern APIs.

Common top-p values

0.1: Very conservative; heavily favors the most likely completions
0.9-0.95: Default setting in most APIs; allows diversity without erratic outputs
1.0: No truncation; all tokens remain candidates

Practical considerations

For most API use cases, the default top-p value (typically 0.9-0.95) requires no adjustment. Changing top-p alone rarely improves output quality significantly. If output quality is poor, examine the prompt and temperature settings first before modifying top-p.

Top-p vs top-k vs temperature

Common top-p values

Practical considerations

Related terms