Jump to Content

Revisiting ResNets: Improved Training Methodologies and Scaling Principles

Irwan Bello
Liam B. Fedus
Xianzhi Du
Aravind Srinivas
Tsung-Yi Lin
Jon Shlens
Barret Richard Zoph
ICML 2021 (2021) (to appear)
Google Scholar

Abstract

Novel ImageNet architectures monopolize the limelight when advancing the state-of-the-art, but progress is often muddled by simultaneous changes to training methodology and scaling strategies. Our work disentangles these factors by revisiting the ResNet architecture using modern training and scaling techniques and, in doing so, we show ResNets match recent state-of-the-art models. A ResNet trained to 79.0 top-1 ImageNet accuracy is increased to 82.2 through improved training methodology alone; two small popular architecture changes further improve this to 83.4. We next offer new perspectives on the scaling strategy which we summarize by two key principles: (1) increase model depth and image size, but not model width (2) increase image size far more slowly than previously recommended. Using improved training methodology and our scaling principles, we design a family of ResNet architectures, ResNet-RS, which are 1.9x - 2.3x faster than the EfficientNets in supervised learning on ImageNet. And though EfficientNet has significantly fewer FLOPs and parameters -- training ResNet-RS is both faster and less memory-intensive, serving as a strong baseline for researchers and practitioners.