AI Glossary
< Feed49 terms from the world of artificial intelligence and neural networks.
Fundamentals
A branch of computer science focused on building systems capable of performing tasks that typically require human intelligence: speech recognition, decision-making, language translation, and content generation.
A subset of AI where algorithms learn patterns from data without being explicitly programmed. Includes supervised, unsupervised, and reinforcement learning.
A subset of machine learning that uses neural networks with many layers (deep networks). Powers modern breakthroughs in language processing, computer vision, and content generation.
A mathematical model inspired by biological neurons. Consists of layers of nodes (neurons), each performing simple computations. Together, layers can approximate complex functions.
AI models with open weights and/or code that can be downloaded and run locally. Examples: Llama, Mistral, Qwen, DeepSeek, Gemma. Opposite of proprietary models (GPT-4, Claude).
A standardized test for evaluating AI model quality. Popular benchmarks: MMLU (knowledge), HumanEval (code), MT-Bench (chat), LMSYS Chatbot Arena (user ELO rating).
Models
A neural network architecture proposed by Google in 2017. Uses an attention mechanism to process sequences. Powers GPT, BERT, Claude, Gemini, and most modern language models.
A neural network with billions of parameters trained on massive text corpora. Capable of generating, analyzing, and transforming text. Examples: GPT-4, Claude, Gemini, Llama, DeepSeek.
The maximum number of tokens a model can process in a single request (including input and output). GPT-4o has 128K, Claude 3.5 has 200K, Gemini has up to 1M.
A key component of transformers that allows the model to focus on relevant parts of the input. Self-attention lets each token 'look at' all other tokens in the sequence.
An extension of the attention mechanism where multiple attention 'heads' operate in parallel, each focusing on different aspects of the input, improving representation quality.
An AI model capable of working with multiple data types: text, images, audio, video. Examples: GPT-4o (text + images), Gemini (text + images + video + audio).
An autonomous LLM-based system capable of planning actions, using tools (search, code, APIs), and iteratively solving tasks without constant human oversight.
An LLM's ability to call external functions and APIs: web search, code execution, database queries. A key capability for AI agents.
An architecture where the model consists of multiple 'experts', but only a subset is activated for each request. Scales parameters without proportional compute increase. Used in Mixtral and DeepSeek V3.
An AI model's ability to reason logically, solve problems, and draw conclusions. Models with explicit reasoning: o1, o3, DeepSeek R1. Use chain-of-thought internally.
An AI model trained to analyze images: classification, object detection, segmentation, captioning. Modern LLMs (GPT-4o, Claude, Gemini) include vision capabilities.
Training
The process of further training a pre-trained model on a specialized dataset to adapt it for a specific task. Requires less data and compute than training from scratch.
An efficient fine-tuning method that updates only low-rank adapters instead of all model weights. Significantly reduces memory and compute requirements.
A training method where the model improves based on human evaluations. Used for alignment — making models helpful, safe, and instruction-following.
An approach where a model trained on one task is adapted to solve another. The basis of fine-tuning: a base model is trained on general data, then refined on specialized data.
NLP
The smallest unit of text that a language model processes. One token is roughly 4 characters in English or 1–2 characters in Russian. API pricing is typically calculated per token.
An algorithm that splits text into tokens before feeding it to a language model. Different models use different tokenizers (BPE, SentencePiece, tiktoken).
The practice of crafting text instructions (prompts) for AI models to get the best results. Includes techniques: few-shot, chain-of-thought, system prompts.
An approach where a model is given a few examples (shots) in the prompt so it understands the expected format and style. Does not require fine-tuning.
A model's ability to perform a task without examples — from text description alone. Larger models typically have better zero-shot capabilities.
A prompting technique where the model reasons step by step before giving a final answer. Improves accuracy on logic, math, and multi-step analysis tasks.
An architecture pattern where the model first retrieves relevant information from a knowledge base, then generates an answer based on the found data. Reduces hallucinations and enables working with up-to-date information.
A numerical representation of text (or another object) as a fixed-length vector. Used for semantic search, clustering, and recommendations. Models: text-embedding-ada-002, Cohere Embed.
A tokenization algorithm that iteratively merges the most frequent character pairs. Used in GPT, Claude, and most modern LLMs.
A generation parameter controlling the randomness of model responses. Low temperature (0.0–0.3) yields deterministic answers, high (0.7–1.0) produces more creative and varied outputs.
A sampling method where the model picks the next token from the smallest set of tokens whose cumulative probability is >= p. An alternative to top-k sampling.
Generation
A type of generative model that learns to create data (typically images) by gradually removing noise. Used in Stable Diffusion, DALL-E 3, Midjourney, FLUX.
An open-source diffusion model for generating images from text. Can run locally on consumer GPUs. Versions: SD 1.5, SDXL, SD 3.
The task of generating an image from a text description. Major services: Midjourney, DALL-E 3, Stable Diffusion, FLUX, Ideogram.
An architecture of two neural networks: a generator creates data, a discriminator evaluates its realism. Used for image generation before diffusion models became dominant.
Technology for converting text to natural speech. Modern TTS systems use neural networks to generate realistic voices. Examples: OpenAI TTS, ElevenLabs, Bark.
Technology for recognizing speech and converting it to text. Examples: OpenAI Whisper, Google Speech-to-Text, Deepgram. Whisper is an open model with high quality.
Infrastructure
A database optimized for storing and searching embeddings. Enables fast semantic similarity search. Examples: Pinecone, Weaviate, Qdrant, ChromaDB.
A method of reducing model size by lowering weight precision (from FP16 to INT8 or INT4). Enables running large models on consumer GPUs with minimal quality loss.
A file format for quantized models used by llama.cpp and other tools for running LLMs locally. Supports various quantization levels (Q4, Q5, Q8).
Video memory of a graphics processor. Determines the maximum model size that can be loaded for inference. Llama 70B in FP16 requires ~140 GB VRAM.
The process of getting a response from a trained model — feeding input and generating output. Unlike training, inference requires significantly less compute.
Model response time — from sending a request to receiving the first token (TTFT) or the full response. A key metric for production systems.
The number of tokens per second a model can generate. Depends on model size, quantization, GPU, and the number of concurrent requests.
A programming interface for interacting with AI models. Allows sending requests and receiving responses via HTTP. Major providers: OpenAI, Anthropic, Google, DeepSeek.
A highly optimized engine for running LLMs locally on CPU and GPU. Supports GGUF format and various quantization levels. One of the most popular tools for running models on consumer hardware.
Safety
When an AI model generates plausible but factually incorrect output. A key challenge of LLMs, addressed through RAG, ground truth data, and verification.
The process of adjusting an AI model so its behavior aligns with human values and expectations. Includes RLHF, constitutional AI, and other safety techniques.