For Developers/Models/DeepSeek V4 Flash
Open SourceDeepSeekReleased 2025-12

DeepSeek V4 Flash

Low-latency MoE variant of DeepSeek V4 - fewer experts activated per token for faster inference

Context window

128K

Input / 1M tokens

Free

Output / 1M tokens

Free

Provider

DeepSeek

Open-source variant of DeepSeek V4. API pricing significantly cheaper than US frontier models.

DeepSeek V4 Flash is the low-latency variant of DeepSeek V4. By activating fewer experts per forward pass, it trades some of the full model's capability for meaningfully lower inference latency. The sparser activation also produces cleaner representation geometry, making it particularly effective for techniques like steering vectors and activation-based control. Popular with researchers exploring LLM interpretability.

Strengths

  • Lower latency than full DeepSeek V4
  • Sparser MoE activation - cleaner residual stream representations
  • Effective for LLM steering and interpretability research
  • Open-source weights
  • Strong performance-to-cost ratio

Best for developers who...

Latency-sensitive inference pipelinesLLM interpretability and steering researchSelf-hosted low-latency deploymentsCost-sensitive production applications

Compare DeepSeek V4 Flash with

All model comparisons →

Learn the concepts