Skip to main content

AI Glossary

Essential AI and machine learning terms explained. From LLM to RAG to Transformers, master the vocabulary of artificial intelligence.

📚 Your AI Vocabulary Guide

Whether you're new to AI or an experienced practitioner, this glossary covers the essential terminology you need to understand modern AI tools and technologies.

LLM (Large Language Model)

A large language model is a neural network trained on vast amounts of text data to understand and generate human language. Examples include GPT-4, Claude, and Gemini. LLMs can perform tasks like text completion, translation, summarization, and reasoning.

RAG (Retrieval-Augmented Generation)

RAG is a technique that combines retrieval systems with generative AI. It fetches relevant information from external sources (like a knowledge base) and feeds it to the LLM to improve the accuracy and relevance of responses, especially for domain-specific queries.

Fine-tuning

Fine-tuning is the process of taking a pre-trained model (like an LLM) and training it further on a specific dataset to improve its performance for a particular task or domain. This is more efficient than training from scratch.

Embeddings

Embeddings are numerical vector representations of text, images, or other data that capture semantic meaning in a high-dimensional space. Similar concepts are represented by similar vectors, enabling similarity search and semantic understanding.

Vector Database

A vector database stores embeddings and enables efficient similarity searches. It's commonly used with RAG systems to retrieve relevant context from large document collections based on semantic similarity.

Prompt Engineering

Prompt engineering is the art and science of crafting effective inputs (prompts) to AI models to achieve desired outputs. It involves techniques like few-shot learning, chain-of-thought prompting, and prompt templating.

Chain of Thought

Chain of thought is a prompting technique where the AI is encouraged to show its intermediate reasoning steps before providing a final answer. This leads to more accurate results, especially for complex logical or mathematical problems.

Token

A token is the basic unit of text that an LLM processes. Tokens can be as short as a single character or as long as a word (e.g., "tokenization" might become ["token", "ization"]). LLMs have context windows measured in tokens (e.g., 128K tokens).

Hallucination

Hallucination occurs when an AI model generates information that sounds plausible but is actually incorrect, fabricated, or not supported by its training data. This is a major challenge in deploying LLMs for factual knowledge tasks.

Zero-shot

Zero-shot learning refers to an AI model's ability to perform tasks it wasn't explicitly trained on. For example, a language model trained on general text can translate languages it has never seen examples of during training.

Few-shot

Few-shot learning is a prompting technique where examples of the desired task are included in the prompt itself. This helps the model understand the pattern without requiring fine-tuning.

AGI (Artificial General Intelligence)

Artificial General Intelligence refers to a hypothetical AI system that can understand, learn, and apply intelligence across any domain at a human level or beyond. Current AI systems are narrow and specialized, not general.

Diffusion Model

Diffusion models are a class of generative AI models that create images or other data by starting with noise and gradually denoising it to match a target distribution. They power tools like DALL-E, Stable Diffusion, and Midjourney.

GAN (Generative Adversarial Network)

A GAN consists of two neural networks—a generator and a discriminator—that compete against each other. The generator creates fake data while the discriminator tries to distinguish real from fake, leading to increasingly realistic outputs.

Transformer

The transformer is a neural network architecture introduced in 2017 that uses self-attention mechanisms to process sequential data. It's the foundation of modern LLMs and many other AI models, enabling parallel processing of text.

Attention Mechanism

Attention mechanisms allow neural networks to focus on the most relevant parts of the input when processing data. In transformers, self-attention computes relationships between all positions in a sequence simultaneously.

Multimodal

Multimodal AI systems can process and understand multiple types of data—text, images, audio, video—rather than just one. GPT-4V and Gemini are examples of multimodal models that can analyze images and text together.

Agent

An AI agent is a system that can autonomously plan, reason, and take actions to achieve goals. Agents often use tools (like web search or code execution) and can break complex tasks into smaller steps.

MCP (Model Context Protocol)

MCP is an open protocol that enables AI models to connect with external tools and data sources. It provides a standardized way for AI systems to access databases, APIs, and other resources to enhance their capabilities.

Tool Use

Tool use in AI refers to the ability of models to interact with external systems—web browsers, calculators, code interpreters, APIs—to accomplish tasks beyond their internal knowledge or capabilities.

Quantization

Quantization reduces the precision of model weights (e.g., from 32-bit to 8-bit) to decrease model size and speed up inference with minimal accuracy loss. It's essential for running large models on consumer hardware.

RLHF (Reinforcement Learning from Human Feedback)

RLHF is a training technique where a model is fine-tuned using human preference data. Human evaluators rank model responses, and this feedback guides the model toward more helpful, harmless, and accurate outputs.

DPO (Direct Preference Optimization)

DPO is a method for aligning language models with human preferences without requiring the complex reward modeling of RLHF. It directly optimizes the model based on preference comparisons between outputs.

LoRA (Low-Rank Adaptation)

LoRA is a parameter-efficient fine-tuning technique that adds small trainable matrices to existing model weights. It allows customization of large models with minimal computational cost and without catastrophic forgetting.

Inference

Inference is the process of using a trained AI model to make predictions or generate outputs. It's what happens when you actually use an AI tool—running the model on new input data to get results.

Training

Training is the process of teaching an AI model by exposing it to large datasets and adjusting its internal parameters (weights) to minimize prediction errors. Training is computationally expensive and happens before deployment.

Parameters

Parameters are the internal weights and biases that define a neural network's behavior. Larger models have more parameters—GPT-3 has 175 billion parameters. More parameters generally mean more capabilities but also higher computational costs.

Context Window

The context window is the maximum amount of text (measured in tokens) an LLM can process in a single inference. It includes both the input prompt and the generated output. Larger context windows enable longer conversations and document analysis.

Temperature

Temperature controls the randomness of AI outputs. Low temperature (e.g., 0.1) produces more deterministic, focused responses. High temperature (e.g., 0.9+) produces more creative, varied outputs. Temperature of 0 is typically greedy/baseline.

Top-p (Nucleus Sampling)

Top-p (or nucleus sampling) is a text generation technique where the model considers only the smallest set of tokens whose cumulative probability exceeds p (e.g., 0.9). It's an alternative to temperature that adaptively controls output diversity.

Continue Learning

Ready to explore AI tools? Check out our curated directory of the best AI tools organized by category.