Insights from Pre-training on Simple Synthetic Tasks

Felix Li
Percy Liang
Yuhuai Tony Wu
NeurIPS(2022) (to appear)
Google Scholar


Pre-training has shown successes for a wide range of domains. However, it is still unclear what properties of the pre-training dataset are needed for effective pre-training. Notably, there are some recent works that show even pre-training on synthetic tasks can also achieve significant gains in downstream tasks. In this work, we perform three experiments that iteratively simplify the pre-training to reduce it to its essence. First, building on prior works, we perform a wider range of evaluations for three existing synthetic pre-training methods over six downstream tasks. Second, we discover that some simple and generic synthetic tasks can achieve almost the same level of benefits as the previous best synthetic task. Lastly, we find 70$\%$ of the benefits can be explained by using a better scale for initialization. We believe our study provides useful and important insights and opens a new direction to understand the effect of pre-training.

Research Areas