Are Pretrained Convolutions Better than Pretrained Transformers?

Yi Tay; Mostafa Dehghani; Jai Prakash Gupta; Vamsi Aribandi; Dara Bahri; Zhen Qin; Don Metzler

Are Pretrained Convolutions Better than Pretrained Transformers?

Yi Tay

Mostafa Dehghani

Jai Prakash Gupta

Vamsi Aribandi

Dara Bahri

Zhen Qin

Don Metzler

ACL 2021

Download Google Scholar

Abstract

In the era of pretrained language models, transformers are the defacto choice of model architectures. While recent works have shown promise in entirely convolutional based architectures, these CNN-based models have not been widely adopted or evaluated under the pretrain-finetune paradigm.
In the context of language models, are convolutional models competitive when pretrained?
This paper investigates this research question and presents several interesting findings. Across a set of extensive experiments, our findings show that CNN-based pretrained models are highly competitive and outperform Transformer-based pretrained models in certain scenarios, albeit with caveats. Overall, the findings of this paper should implore the broader academic community to perhaps not conflate pretraining advances with architectural advances and both set of techniques could be studied in isolation.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Are Pretrained Convolutions Better than Pretrained Transformers?

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs