Embeddings
Numerical vector representations of text (or images) that capture semantic meaning so similar content has similar vector coordinates.
An embedding converts a piece of text into a fixed-length vector of floating-point numbers - typically 384 to 3072 dimensions. Two texts with similar meaning will have similar vectors; unrelated texts will point in different directions. This geometric property makes embeddings the backbone of semantic search, RAG, clustering, and recommendation systems.
How embedding models work
Embedding models are typically transformer encoders (like BERT) or dual-encoders trained on pairs of similar and dissimilar texts. They map variable-length text to fixed-length vectors by mean-pooling or using a special [CLS] token.
Common embedding models
- OpenAI text-embedding-3-large: High-quality, 3072 dimensions, $0.13/1M tokens
- Cohere Embed v3: Strong multilingual, built-in compression support
- BGE series (BAAI): Open-source, top performers on MTEB benchmark
- Sentence-Transformers: Open-source library with many models; all.mpnet-base-v2 is a reliable default
Choosing an embedding model
The MTEB (Massive Text Embedding Benchmark) leaderboard is the standard evaluation. For RAG workloads specifically, look at the Retrieval column. Match the embedding model's max token length to your chunk size - if you embed 512-token chunks, choose a model with at least 512-token capacity.
Embeddings vs LLM knowledge
Embeddings encode surface-level semantic similarity. They don't "understand" your documents the way a large LLM does. This is why RAG combines embeddings (for fast, approximate retrieval) with LLMs (for precise comprehension and generation).