Attention Guidance: Guiding Attention for Self-Supervised Learning with Transformers
Talk, IBM Research, New York, USA
Presented our work on using attention guidance, in which we use intuitive priors to modify self-attention heads in Transformers to get faster convergence and better performance. [Slides]