Google Research

Dual PatchNorm

Transactions on Machine Learning Research (2023) (to appear)


We discover that just placing two LayerNorms: before and after the patch embedding layer leads to improvements over well-tuned ViT models. In particular, this outperforms exhaustive search for alternative LayerNorm placement strategies in the transformer block itself.

