llama.cpp

A highly optimized engine for running LLMs locally on CPU and GPU. Supports GGUF format and various quantization levels. One of the most popular tools for running models on consumer hardware.

Related terms