- Felix Li
- Percy Liang
- Yuhuai Tony Wu
Pre-training has shown successes for a wide range of domains. However, it is still unclear what properties of the pre-training dataset are needed for effective pre-training. Notably, there are some recent works that show even pre-training on synthetic tasks can also achieve significant gains in downstream tasks. In this work, we perform three experiments that iteratively simplify the pre-training to reduce it to its essence. First, building on prior works, we perform a wider range of evaluations for three existing synthetic pre-training methods over six downstream tasks. Second, we discover that some simple and generic synthetic tasks can achieve almost the same level of benefits as the previous best synthetic task. Lastly, we find 70$\%$ of the benefits can be explained by using a better scale for initialization. We believe our study provides useful and important insights and opens a new direction to understand the effect of pre-training.
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work