Multi-Head Attention

< Glossary
Models

An extension of the attention mechanism where multiple attention 'heads' operate in parallel, each focusing on different aspects of the input, improving representation quality.

Related terms