Mixture of Experts (MoE)

An architecture where the model consists of multiple 'experts', but only a subset is activated for each request. Scales parameters without proportional compute increase. Used in Mixtral and DeepSeek V3.

Related terms