For Developers/Models/Qwen 3

Open SourceAlibaba (Qwen Team)Released 2025-04

Qwen 3

Alibaba's open-weight model family with switchable thinking and non-thinking modes.

Context window

128K tokens (32K for 0.6B/1.7B/4B dense variants)

Input / 1M tokens

Free

Output / 1M tokens

Free

Provider

Alibaba (Qwen Team)

Open-weight models under Apache 2.0 - free to download and self-host. Running costs depend on your hardware or inference provider pricing (e.g., Together AI, Fireworks, Ollama, Hugging Face). Alibaba Cloud Model Studio also offers hosted API access. · Data verified 2026-07-02

Qwen3 is Alibaba's open-weight large language model series, released April 29, 2025, under Apache 2.0. It spans two MoE models - Qwen3-235B-A22B (235B total / 22B active) and Qwen3-30B-A3B (30B total / 3B active) - plus six dense models (0.6B, 1.7B, 4B, 8B, 14B, 32B). Its signature feature is a dual-mode design: a 'thinking' mode for step-by-step reasoning and a 'non-thinking' mode for fast responses, controllable per task. Larger models support a 128K-token context window (32K on the smallest dense variants), and the series supports 119 languages and dialects with strong coding, math, and agentic performance.

Capability index

Relative estimates (0-100) to place this model against its peers, grounded in published benchmarks.

Coding

Reasoning

Math

Multimodal

Long context

Speed

Cost efficiency

How to access it

Download open weights from Hugging Face, ModelScope, or Kaggle and run via Transformers, vLLM, SGLang, or Ollama. Also available through Alibaba Cloud Model Studio and third-party inference providers.

Get access →Documentation →

Strengths

✓Open weights under permissive Apache 2.0 license
✓Switchable thinking / non-thinking modes for quality-vs-speed control
✓Wide range of sizes from 0.6B to 235B (MoE)
✓Efficient MoE variants (e.g., 30B-A3B activates only 3B parameters)
✓Broad multilingual support (119 languages and dialects)

Best for developers who...

Self-hosted reasoning and coding assistantsApplications needing switchable reasoning depthMultilingual open-model deploymentsEfficient MoE inference on limited hardware

When to choose it (and when not to)

Reach for Qwen 3 when...

→You need an open, self-hostable model with a permissive license
→You want to toggle deep reasoning on or off per request
→Multilingual applications
→Efficient inference via MoE with few active parameters

Look elsewhere if...

✕You want a fully managed API with no infrastructure (though hosted options exist)
✕You need native multimodal input (base Qwen3 dense/MoE releases are text-focused)
✕You require the newest frontier quality (newer Qwen 3.x releases and frontier hosted models exist)

How to use it

›Enable thinking mode for hard reasoning/math/coding; disable it for fast, cheap responses
›Use the instruction-tuned checkpoints and Qwen chat template
›Pick an MoE variant (30B-A3B) for a strong quality-to-compute ratio
›Quantize to fit your GPU; smaller dense models suit edge deployment

Quickstart

Python

from transformers import pipeline

pipe = pipeline("text-generation", model="Qwen/Qwen3-235B-A22B", device_map="auto")
messages = [{"role": "user", "content": "Solve: what is the derivative of x^3?"}]
print(pipe(messages, max_new_tokens=256)[0]["generated_text"])

Install `transformers` and `accelerate`. For a smaller footprint use Qwen/Qwen3-30B-A3B or Qwen/Qwen3-8B, or run locally with `ollama run qwen3`.

API model id: Qwen/Qwen3-235B-A22B

Benchmarks

Benchmark	Score	Notes
Qwen3-235B-A22B	235B total / 22B active	Flagship MoE; competitive on coding, math, and general benchmarks
Qwen3-30B-A3B	30B total / 3B active	Outcompetes QwQ-32B with ~10x fewer active parameters per Qwen

Source: Qwen team - Qwen3 blog

Compare Qwen 3 with

Qwen 3 vs DeepSeek V3

DeepSeek - 128K tokens ctx

Compare →

Qwen 3 vs Llama 4

Meta - Up to 10M tokens (Scout); ~1M tokens (Maverick) ctx

Compare →

Qwen 3 vs Gemma 3

Google DeepMind - 128K tokens (32K for the 1B variant) ctx

Compare →

Qwen 3 vs Mistral Large

Mistral AI - 128000 ctx

Compare →

All model comparisons →

Learn the concepts

Fine-tuning MoE (Mixture of Experts)Distillation Tokenization

← All AI models