Quantization

< Glossary
Infrastructure

A method of reducing model size by lowering weight precision (from FP16 to INT8 or INT4). Enables running large models on consumer GPUs with minimal quality loss.

Related terms