DeepSeek V3
State-of-the-art open-weights model that shocked the industry with frontier performance at minimal cost
Context window
128K
Input / 1M tokens
$0.27
Output / 1M tokens
$1.10
Provider
DeepSeek
Via DeepSeek API; free to self-host (open weights under MIT-style license)
DeepSeek V3 achieved GPT-4o-level performance using only 2,048 H800 GPUs in training, at a reported cost of ~$5.5M. Its Mixture-of-Experts architecture activates only 37B of 671B total parameters per token, keeping inference efficient. Open weights under a permissive license mean you can run it locally or fine-tune without API dependency.
Strengths
- ✓Near-GPT-4o quality at a fraction of the price
- ✓Open weights - self-host or fine-tune freely
- ✓Efficient MoE architecture reduces inference cost
- ✓Strong coding (Aider polyglot, HumanEval)
- ✓Good instruction following and structured output
Best for developers who...
Benchmarks
| Benchmark | Score | Notes |
|---|---|---|
| HumanEval | 90.2% | Matches GPT-4o on coding |
| MMLU | 88.5% | Matches GPT-4o on knowledge |
| Aider Polyglot | 55.0% | Strong multi-language coding |
Source: DeepSeek V3 technical report
Compare DeepSeek V3 with
DeepSeek V3 vs Llama 4
Meta - 10M ctx
DeepSeek V3 vs Qwen 3
Alibaba (Qwen Team) - 128K ctx
DeepSeek V3 vs GPT-4o
OpenAI - 128K ctx
DeepSeek V3 vs Claude Sonnet 4.6
Anthropic - 200K ctx