Latency
< GlossaryModel response time — from sending a request to receiving the first token (TTFT) or the full response. A key metric for production systems.
Model response time — from sending a request to receiving the first token (TTFT) or the full response. A key metric for production systems.