
The Bunny That Broke the Transformer
Three researchers in Nanjing proved that Vision Transformers can't learn 3D spatial reasoning. The ceiling isn't data or training or scale. It's the architecture itself.

Three researchers in Nanjing proved that Vision Transformers can't learn 3D spatial reasoning. The ceiling isn't data or training or scale. It's the architecture itself.