Quantization

Infrastructure

A method of reducing model size by lowering weight precision (from FP16 to INT8 or INT4). Enables running large models on consumer GPUs with minimal quality loss.

Related terms

GGUF LoRA (Low-Rank Adaptation)VRAM (Video RAM)