Tackling fundamental questions in deep learning and physics using a scientific approach. Our main focus is on understanding and improving the capabilities of large language models.
About the team
Our goal is to understand the principles that govern machine learning and improve their capabilities. We are focused on understanding the limitations of large scale transformer models and extending their capabilities to solving challenging problems in areas such as mathematics, science, programming, algorithms, and planning.
In these domains, agents can make use of very long context, adaptive inference-time compute (e.g., scratchpad, recurrence, memory), external tools (e.g., library of functions, search engine, calculator, additional models), or other methods to solve out-of-training-domain problems when using instructions and provided with a few examples.
Research areas
Team focus summaries
Highlighted projects
We introduce Minerva, a large language model that achieves state-of-the-art performance on solving mathematics, science, and engineering problems without the use of external tools.
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark consisting of 207 tasks, contributed by over 400 authors across 132 institutions intended to probe large language models, and extrapolate their future capabilities.
We found that a 540-billion parameter language model shows the continued benefits of scaling by matching or surpassing human performance on a diverse set of tasks.
We show that asking large language models to write their intermediate computations in a scratchpad would enable them to perform complex tasks involving multi-step computations.
We run careful empirical studies exploring the length generalization capabilities of transformer-based language models highlight the role of in-context learning and scratchpad.
We introduce the Block-Recurrent Transformer, which applies a transformer layer in a recurrent fashion along a sequence, and has linear complexity with respect to sequence length.
We propose, derive, and investigate a categorization of scaling laws for generalization in neural networks.
We propose Sharpness-Aware Minimization (SAM), an optimization algorithm that improves generalization by seeking parameters that lie in neighborhoods having uniformly low loss.
Featured publications
ICLR Oral (2021)
ICLR Spotlight (2019) (to appear)