AI Glossary
Plain-English explanations of LLM and AI terms that matter for developers. No fluff, no hype - just what you need to understand to build with AI.
Core Concepts
Agentic AI
AI systems that autonomously execute multi-step tasks by deciding which tools to use, in what order, and when to stop.
Read more →Context Window
The maximum number of tokens an LLM can process in a single inference call, including both input (prompt, context, history) and output (generated response).
Read more →Inference
The process of running a trained model to generate outputs from inputs, also called the serving phase as opposed to training.
Read more →Tokenization
The process of converting text into a sequence of integers that an LLM can process, where each integer represents a token - typically a subword fragment, full word, or punctuation character.
Read more →Architecture
Embeddings
Numerical vector representations of text, images, or other data that capture semantic meaning so similar content has similar vector coordinates.
Read more →MCP (Model Context Protocol)
An open protocol released by Anthropic that standardizes how AI assistants connect to external data sources, APIs, and tools through a unified interface.
Read more →MoE (Mixture of Experts)
A neural network architecture where many specialized sub-networks (experts) exist within a single model, but only a subset is activated for each input token, enabling large total parameter counts with lower per-token compute costs.
Read more →Multimodal
Models that process and generate multiple types of data - text, images, audio, and video - within a unified architecture.
Read more →RAG (Retrieval-Augmented Generation)
A technique that improves LLM responses by retrieving relevant documents from an external knowledge base before generation, reducing hallucinations and enabling access to information beyond training data.
Read more →Vector Database
A database optimized for storing and querying high-dimensional embedding vectors, enabling fast approximate nearest-neighbor search.
Read more →Training
Distillation
Training a smaller model to replicate the behavior of a larger model by learning from its output distributions rather than ground-truth labels alone.
Read more →Fine-tuning
Continued training of a pre-trained model on a smaller, task-specific dataset to adapt its weights for a particular domain, task, or output format.
Read more →LoRA (Low-Rank Adaptation)
A parameter-efficient fine-tuning method that freezes model weights and trains low-rank adapter matrices, reducing trainable parameters by roughly 10,000x while maintaining comparable task performance.
Read more →RLHF (Reinforcement Learning from Human Feedback)
A training method that uses human preference ratings to fine-tune a model's behavior after initial pre-training, making it more aligned with desired outputs for helpfulness, accuracy, and safety.
Read more →Inference
JSON Mode
An API setting that constrains an LLM to output valid JSON without enforcing a specific schema.
Read more →KV Cache
A performance optimization that stores key and value tensors from attention computations for previously seen tokens, eliminating recomputation during autoregressive generation.
Read more →Latency vs Throughput
Latency measures how long a single request takes to complete; throughput measures how many requests can be processed per second.
Read more →Structured Output
Constraining an LLM to produce output in a specific format (JSON, XML, a defined schema) rather than free-form text.
Read more →Temperature
A sampling parameter that controls randomness in an LLM's output by scaling the probability distribution of tokens - lower values produce more deterministic results, higher values produce more varied results.
Read more →Top-p (Nucleus Sampling)
A sampling strategy that selects tokens by including the most probable candidates until their cumulative probability reaches a threshold p, allowing the number of candidates to vary based on the model's confidence.
Read more →Prompting
Chain-of-Thought (CoT)
A prompting technique where a model is instructed to produce intermediate reasoning steps before generating a final answer, improving performance on complex tasks like math, logic, and code generation.
Read more →Prompt Injection
An attack where malicious instructions embedded in external content attempt to override an LLM's system prompt or alter its behavior.
Read more →System Prompt
Instructions provided to an LLM at the start of a conversation that define its persona, behavior, response format, and constraints.
Read more →Tool Use / Function Calling
The ability of an LLM to request execution of external functions, APIs, or tools by outputting structured function calls, enabling it to take actions beyond text generation.
Read more →