LoRA (Low-Rank Adaptation)
A parameter-efficient fine-tuning method that adds small trainable matrices to frozen model weights, enabling fast and cheap fine-tuning.
LoRA (Low-Rank Adaptation) is a fine-tuning technique that avoids updating all model weights. Instead, it freezes the original weights and adds two small matrices (A and B) to each attention layer. Only A and B are trained. Because they're low-rank (their inner dimension is small), the total number of trainable parameters is a tiny fraction of the full model's weight count.
Why LoRA dominates open-source fine-tuning
Full fine-tuning a 70B parameter model requires multiple high-end GPUs, weeks of training, and significant storage. LoRA reduces trainable parameters by 10,000x while achieving comparable results on most tasks. A 70B model can be LoRA fine-tuned on a single A100 in hours. The resulting LoRA adapter (the A and B matrices) is typically 10-100MB rather than hundreds of GB.
QLoRA: LoRA at even lower cost
QLoRA combines LoRA with 4-bit quantization of the base model. The frozen base weights are stored in 4-bit, the LoRA adapters in 16-bit. This lets you fine-tune a 70B model on a single 48GB GPU - accessible on consumer high-end hardware or a single cloud GPU instance.
LoRA vs full fine-tuning
LoRA is best when the task is well-defined and the base model already has relevant capabilities. If you're trying to inject entirely new knowledge or radically change model behavior, full fine-tuning or continued pre-training may be necessary. For most practical adaptation tasks (format, style, domain-specific terminology), LoRA is the right default.
Related terms
Models relevant to LoRA (Low-Rank Adaptation)
Llama 4
Meta's multimodal open-weights model family with a 10M context window variant
View model →Qwen 3
Alibaba's highly capable open-weights model with top-tier multilingual performance
View model →Gemma 3
Google's open-weights model family optimized for on-device and edge deployment
View model →Mistral Large
Europe's leading frontier model - strong on code, multilingual tasks, and function calling
View model →