Closed SourceOpenAIReleased 2024-12

o1

OpenAI's first-generation deep-reasoning model that thinks before answering

Context window

200,000 tokens (100,000 max output)

Input / 1M tokens

$15.00

Output / 1M tokens

$60.00

Provider

OpenAI

Cached input $7.50 per 1M. o1 is described in OpenAI docs as the 'previous full o-series reasoning model'; it remains available but has been superseded by newer reasoning models. Output tokens include billed internal reasoning tokens. · Data verified 2026-07-02

o1 is OpenAI's original o-series reasoning model, released in full form on December 17, 2024 (the o1-preview appeared September 2024). It produces a long internal chain of thought before responding, making it strong on hard math, science, and competitive programming. It has a 200,000-token context window with up to 100,000 output tokens (including billed reasoning tokens), accepts text and image input, and has a knowledge cutoff of October 1, 2023. Benchmark highlights include 74% on AIME 2024 (single sample), 77.3% on GPQA Diamond (exceeding PhD-level humans at ~69.7%), and an ~89th-90th percentile Codeforces ranking. It is priced at $15 input / $60 output per 1M and is now a previous-generation reasoning model.

Capability index

Relative estimates (0-100) to place this model against its peers, grounded in published benchmarks.

Coding

Reasoning

Math

Multimodal

Long context

Speed

Cost efficiency

How to access it

Available in the OpenAI API via model id 'o1' (snapshot 'o1-2024-12-17'). The 'o1-preview-2024-09-12' snapshot is deprecated. Positioned as the previous full o-series reasoning model; still usable but superseded by newer reasoning models.

Get access →Documentation →

Strengths

✓Deep step-by-step reasoning for hard math and science problems
✓Strong competitive-programming performance (~89th-90th percentile Codeforces)
✓Exceeds PhD-level human accuracy on GPQA Diamond (77.3% vs ~69.7%)
✓Large 200K context and 100K output budget for long reasoning chains
✓Multimodal text + image input

Best for developers who...

Hard math and science reasoningCompetitive and algorithmic programmingDeliberate multi-step problem solving

When to choose it (and when not to)

Reach for o1 when...

→Hard, multi-step math, science, and logic problems that reward deliberate reasoning
→Competitive programming and algorithmic problem solving
→Existing o1-based pipelines already validated for reasoning tasks

Look elsewhere if...

✕Latency- or cost-sensitive tasks (it is slow and expensive: $15/$60 per 1M)
✕Simple or purely conversational tasks (use GPT-4o or a newer general model)
✕New projects that could use newer, cheaper, stronger o-series/GPT-5 reasoning models

How to use it

›Keep prompts simple and direct - avoid explicit 'think step by step' instructions; o1 reasons internally by design
›Do not add few-shot chain-of-thought examples; they can hurt reasoning-model performance
›Budget generously for output/reasoning tokens on hard problems
›Provide only the essential context; state the goal and constraints clearly

Quickstart

Python

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="o1",
    messages=[{"role": "user", "content": "Prove that the square root of 2 is irrational."}],
)
print(response.choices[0].message.content)

Pin snapshot 'o1-2024-12-17'. o1 reasons internally, so avoid chain-of-thought prompting; expect higher latency and token usage.

API model id: o1

Benchmarks

Benchmark	Score	Notes
AIME 2024	74%	Single sample per problem (pass@1); rises to ~83% with 64-sample consensus, per OpenAI.
GPQA Diamond	77.3%	Zero-shot pass@1, exceeding PhD-level human experts (~69.7%), per OpenAI.
Codeforces	~89th percentile	Competitive programming; OpenAI reports Elo 1807, better than ~93% of competitors.

Source: OpenAI - Learning to reason with LLMs

Compare o1 with

o1 vs GPT-5

OpenAI - 400,000 tokens (128,000 max output) ctx

Compare →

o1 vs GPT-4o

OpenAI - 128,000 tokens (16,384 max output) ctx

Compare →

o1 vs Claude Opus 4.7

Anthropic - 1M ctx

Compare →

o1 vs Gemini 2.5 Pro

Google DeepMind - 1,048,576 tokens (1M) input; up to 65K output ctx

Compare →

All model comparisons →

Learn the concepts

Chain-of-Thought (CoT)Inference Latency vs Throughput

← All AI models