Benchmark
< GlossaryA standardized test for evaluating AI model quality. Popular benchmarks: MMLU (knowledge), HumanEval (code), MT-Bench (chat), LMSYS Chatbot Arena (user ELO rating).
A standardized test for evaluating AI model quality. Popular benchmarks: MMLU (knowledge), HumanEval (code), MT-Bench (chat), LMSYS Chatbot Arena (user ELO rating).