Multi-Head Attention
< GlossaryAn extension of the attention mechanism where multiple attention 'heads' operate in parallel, each focusing on different aspects of the input, improving representation quality.
An extension of the attention mechanism where multiple attention 'heads' operate in parallel, each focusing on different aspects of the input, improving representation quality.