Special CMX Seminar

Wednesday, June 26, 2024
12:00pm to 1:00pm
Online Event
A Mathematical Perspective on Transformers
Borjan Geshkovski, Junior Researcher, Laboratoire Jacques-Louis Lions at Sorbonne Université, Inria,

This talk will report on several results, insights and perspectives Cyril Letrouit, Yury Polyanskiy, Philippe Rigollet and myself have found regarding Transformers. We model Transformers as interacting particle systems on the unit sphere (each particle representing a token, and time representing a layer), with a non-linear coupling called self-attention. On high-dimensional spheres, we prove that randomly initialized particles converge to a single cluster in long time. The result can be quantified by describing the phase transition between the clustering and non-clustering regime. The appearance of dynamic metastability will also be discussed.

