Latency

< Glossary
Infrastructure

Model response time — from sending a request to receiving the first token (TTFT) or the full response. A key metric for production systems.

Related terms