Jump to Content
Johannes von Oswald

Johannes von Oswald

My research is focused on neural network architectures, learning algorithms, hypernetworks, mechanistic interpretability, mesa-optimization and meta-learning.

Research Areas

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Transformers have become the state-of-the-art neural network architecture across numerous domains of machine learning. This is partly due to their celebrated ability to transfer and to learn in-context based on a few examples. Nevertheless, the mechanism of why and how Transformers become in-context learners is not well understood and remains mostly an intuition. Here, we argue that training Transformers on auto-regressive tasks can be closely related to well-known gradient-based meta-learning formulations. We do so by providing a simple construction that shows the equivalence of data transformations induced by 1) a single linear self-attention layer and by 2) gradient-descent on a regression loss. Motivated by that construction, we show empirically that when training self-attention only Transformers on simple regression tasks either the models learned by GD and Transformers show great similarity or, remarkably, the solutions found by gradient descent converge in weight space to our construction. This allows us, at least on our simple regression tasks, to mechanistically understand the inner workings of Transformers that enables in-context learning within. Finally, we discuss intriguing parallels to a mechanism identified as crucial for in-context learning termed induction-head (Olsson et al., 2022) and show how it could be generalized by in-context learning by gradient descent within Transformers. View details
    No Results Found