Throughput
< GlossaryThe number of tokens per second a model can generate. Depends on model size, quantization, GPU, and the number of concurrent requests.
The number of tokens per second a model can generate. Depends on model size, quantization, GPU, and the number of concurrent requests.