NeuroServicesNews

Throughput

Infrastructure

The number of tokens per second a model can generate. Depends on model size, quantization, GPU, and the number of concurrent requests.

Related terms

Latency Inference