Transactions on Machine Learning Research (2023) (to appear)
We discover that just placing two LayerNorms: before and after the patch embedding layer leads to improvements over well-tuned ViT models. In particular, this outperforms exhaustive search for alternative LayerNorm placement strategies in the transformer block itself.
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work