Embeddings

An embedding converts a piece of text, image, or other data into a fixed-length vector of floating-point numbers, typically 384 to 3072 dimensions. Texts with similar meaning produce similar vectors; unrelated texts point in different directions. This geometric property makes embeddings foundational for semantic search, RAG, clustering, and recommendation systems.

How embedding models work

Embedding models are typically transformer encoders (like BERT) or dual-encoders trained on pairs of similar and dissimilar examples. They map variable-length inputs to fixed-length vectors through mean-pooling or by extracting a special [CLS] token representation.

Common embedding models

OpenAI text-embedding-3-large: 3072 dimensions, $0.13 per million tokens
OpenAI text-embedding-3-small: 512 dimensions, $0.02 per million tokens, often sufficient for RAG
Cohere Embed v3: Strong multilingual support with built-in compression
BGE series (BAAI): Open-source, top performers on MTEB benchmark
Sentence-Transformers: Open-source library with many models; all.mpnet-base-v2 is a reliable default

Choosing an embedding model

The MTEB (Massive Text Embedding Benchmark) leaderboard is the standard evaluation resource. For RAG workloads, check the Retrieval column scores. Match the embedding model's maximum token length to your chunk size: if you embed 512-token chunks, select a model with at least 512-token capacity. Consider cost and latency against quality needs; smaller models like text-embedding-3-small often perform well for many production RAG systems.

Embeddings vs LLM knowledge

Embeddings capture surface-level semantic similarity through learned vector representations. They don't possess the broad knowledge or reasoning capabilities of large language models. This is why RAG systems combine embeddings for fast, approximate retrieval with LLMs for comprehension and generation. The embedding retrieves candidate chunks; the LLM reads and reasons over them.

How embedding models work

Common embedding models

Choosing an embedding model

Embeddings vs LLM knowledge

Related terms

Models relevant to Embeddings