LoRA (Low-Rank Adaptation)

LoRA (Low-Rank Adaptation) is a fine-tuning technique that avoids updating all model weights. Instead, it freezes the original weights and adds two small matrices (A and B) to each attention layer. Only A and B are trained. Because they're low-rank (their inner dimension is small), the total number of trainable parameters is a tiny fraction of the full model's weight count.

Practical efficiency gains

Full fine-tuning a 70B parameter model requires multiple high-end GPUs, weeks of training, and significant storage. LoRA reduces trainable parameters by roughly 10,000x while achieving comparable results on most tasks. A 70B model can be LoRA fine-tuned on a single 80GB A100 GPU in hours. The resulting LoRA adapter (the A and B matrices) is typically 10-100MB rather than hundreds of GB, making distribution and deployment straightforward.

QLoRA: Further reducing hardware requirements

QLoRA combines LoRA with 4-bit quantization of the base model. The frozen base weights are stored in 4-bit precision, while LoRA adapters remain in 16-bit precision. This reduces memory requirements further, allowing you to fine-tune a 70B model on a single 48GB GPU, accessible via consumer-grade hardware or standard cloud GPU instances.

When LoRA is appropriate

LoRA works best when the base model already possesses relevant capabilities and you're adapting behavior for a specific task, format, or domain. Common use cases include instruction tuning, style transfer, domain-specific terminology, and task-specific adaptation. LoRA typically underperforms when injecting entirely new knowledge that the base model lacks or when substantially changing fundamental model behavior. In those cases, continued pre-training or full fine-tuning may be necessary.

Adapter composition and merging

A key advantage of LoRA is that multiple adapters can be trained independently and combined at inference time without retraining. You can also merge a LoRA adapter back into model weights to recover the original inference speed, though this eliminates the storage and distribution benefits. This flexibility has made LoRA the standard for open-source fine-tuning workflows.

Practical efficiency gains

QLoRA: Further reducing hardware requirements

When LoRA is appropriate

Adapter composition and merging

Related terms

Models relevant to LoRA (Low-Rank Adaptation)