Benchmark

Fundamentals

A standardized test for evaluating AI model quality. Popular benchmarks: MMLU (knowledge), HumanEval (code), MT-Bench (chat), LMSYS Chatbot Arena (user ELO rating).

Related terms

Large Language Model (LLM)