AI Glossary
Plain-English explanations of LLM and AI terms that matter for developers. No fluff, no hype - just what you need to understand to build with AI.
Core Concepts
Agentic AI
AI systems that autonomously execute multi-step tasks by deciding which tools to use, in what order, and for how long.
Read more →Context Window
The maximum amount of text an LLM can process at once - both your input and the model's output count against the limit.
Read more →Inference
The process of running a trained model to generate outputs from inputs - the "serving" phase, as opposed to training.
Read more →Tokenization
The process of splitting text into tokens - the smallest units an LLM processes - which are usually subword pieces, not full words.
Read more →Architecture
Embeddings
Numerical vector representations of text (or images) that capture semantic meaning so similar content has similar vector coordinates.
Read more →MCP (Model Context Protocol)
An open standard that lets AI assistants securely connect to external data sources, APIs, and tools through a unified interface.
Read more →MoE (Mixture of Experts)
A model architecture where only a subset of the model's parameters are activated per token, enabling very large total capacity at manageable inference cost.
Read more →Multimodal
Models that process and generate multiple types of data - text, images, audio, and video - within a unified architecture.
Read more →RAG (Retrieval-Augmented Generation)
A technique that improves LLM responses by retrieving relevant documents from an external knowledge base before generation, reducing hallucinations and enabling access to information beyond training data.
Read more →Vector Database
A database optimized for storing and querying high-dimensional embedding vectors, enabling fast approximate nearest-neighbor search.
Read more →Training
Distillation
Training a smaller, cheaper model to mimic the outputs of a larger, more capable model.
Read more →Fine-tuning
Continuing a pre-trained model's training on a smaller, task-specific dataset to adapt it for a particular domain or behavior.
Read more →LoRA (Low-Rank Adaptation)
A parameter-efficient fine-tuning method that adds small trainable matrices to frozen model weights, enabling fast and cheap fine-tuning.
Read more →RLHF (Reinforcement Learning from Human Feedback)
A training method that uses human preference ratings to shape a model's behavior, making it more helpful, honest, and safe.
Read more →Inference
JSON Mode
An API setting that forces an LLM to always output valid JSON, without enforcing a specific schema.
Read more →KV Cache
A performance optimization that stores attention computation results for previously seen tokens, avoiding expensive recomputation on each inference step.
Read more →Latency vs Throughput
Latency measures how long a single request takes to complete; throughput measures how many requests can be processed per second.
Read more →Structured Output
Constraining an LLM to produce output in a specific format (JSON, XML, a defined schema) rather than free-form text.
Read more →Temperature
A sampling parameter that controls how random or deterministic an LLM's output is - lower values = more predictable, higher values = more creative.
Read more →Top-p (Nucleus Sampling)
A sampling strategy that dynamically limits token selection to the smallest set of tokens whose cumulative probability exceeds a threshold p.
Read more →Prompting
Chain-of-Thought (CoT)
A prompting technique where the model is asked to reason step-by-step before giving a final answer, dramatically improving accuracy on complex tasks.
Read more →Prompt Injection
An attack where malicious instructions hidden in external content attempt to override an LLM's system prompt or change its behavior.
Read more →System Prompt
Instructions provided to an LLM before the user conversation that define its persona, behavior, format, and constraints.
Read more →Tool Use / Function Calling
The ability of an LLM to request execution of external functions, APIs, or tools, enabling it to act on the world rather than just generate text.
Read more →