Alexander Mordvintsev

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Transformers have become the state-of-the-art neural network architecture across numerous domains of machine learning. This is partly due to their celebrated ability to transfer and to learn in-context based on a few examples. Nevertheless, the mechanism of why and how Transformers become in-context learners is not well understood and remains mostly an intuition. Here, we argue that training Transformers on auto-regressive tasks can be closely related to well-known gradient-based meta-learning formulations. We do so by providing a simple construction that shows the equivalence of data transformations induced by 1) a single linear self-attention layer and by 2) gradient-descent on a regression loss. Motivated by that construction, we show empirically that when training self-attention only Transformers on simple regression tasks either the models learned by GD and Transformers show great similarity or, remarkably, the solutions found by gradient descent converge in weight space to our construction. This allows us, at least on our simple regression tasks, to mechanistically understand the inner workings of Transformers that enables in-context learning within. Finally, we discuss intriguing parallels to a mechanism identified as crucial for in-context learning termed induction-head (Olsson et al., 2022) and show how it could be generalized by in-context learning by gradient descent within Transformers. View details
    Preview abstract Training an end-to-end differentiable, self-organising cellular automata model of morphogenesis, able to both grow and regenerate specific patterns. View details
    Preview abstract We present a Message Passing based Learning Protocol (MPLP) for artificial neural networks. With this protocol, every synapse (weights and biases), and activation is considered an independent agent, responsible for ingesting incoming messages, updating their own states, and outputting n-dimensional messages for their neighbours. We show how this protocol can be used instead of a traditional gradient-based approach for traditional feed-forward neural networks, and present a framework capable of generalizing neural networks to explore more flexible architectures. We meta-learn the MPLP through end-to-end gradient-based meta-optimisation. Finally, we discuss where the strengths of MPLP lay, and where we foresee possible limitations. View details
    The Building Blocks of Interpretability
    Christopher Olah
    Arvind Satyanarayan
    Ian Johnson
    Shan Carter
    Ludwig Schubert
    Katherine Ye
    Distill (2018)
    Preview abstract Interpretability techniques are normally studied in isolation. We explore the powerful interfaces that arise when you combine them -- and the rich structure of this combinatorial space. View details
    Associative Domain Adaptation
    Philip Haeusser
    Thomas Frerix
    Daniel Cremers
    International Conference on Computer Vision (ICCV), IEEE (2017) (to appear)
    Preview abstract We propose associative domain adaptation, a novel technique for end-to-end domain adaptation with neural networks, the task of inferring class labels for an unlabeled target domain based on the statistical properties of a labeled source domain. Our training scheme follows the paradigm that in order to effectively derive class labels for the target domain, a network should produce statistically domain invariant embeddings, while minimizing the classification error on the labeled source domain. We accomplish this by reinforcing associations between source and target data directly in embedding space. Our method can easily be added to any existing classification network with no structural and almost no computational overhead. We demonstrate the effectiveness of our approach on various benchmarks and achieve state-of-the-art results across the board with a generic convolutional neural network architecture not specifically tuned to the respective tasks. Finally, we show that the proposed association loss produces embeddings that are more effective for domain adaptation compared to methods employing maximum mean discrepancy as a similarity measure in embedding space. View details
    Feature Visualization
    Christopher Olah
    Ludwig Schubert
    Distill (2017)
    Preview abstract Neural network feature visualization is a powerful technique. It can answer questions about what a network — or parts of a network — are looking for by generating idealized examples of what the network is trying to find. Over the last few years, the field has made great strides in feature visualization. Actually getting it to work, however, involves a number of details. In this article, we examine the major issues and explore common approaches to solving them. We find that remarkably simple methods can produce state-of-the-art visualizations — and that, surprisingly, these visualizations are often limited by optimization problems that can be solved with standard techniques. View details
    Learning by Association - A versatile semi-supervised training method for neural networks
    Philip Haeusser
    Daniel Cremers
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
    Preview abstract In many real-world scenarios, labeled data for a specific training task is costly to obtain. Semi-supervised methods make use of abundantly available unlabeled data and a smaller number of labeled examples. We propose a new framework for semi-supervised training of deep neural networks that is inspired by learning in humans. "Associations" are made from embeddings of labeled samples to those of unlabeled ones and back. The optimization schedule encourages correct association cycles that end up at the same class where the association was started from and penalizes wrong associations that end at a different class. The implementation is easy to use and can be added to any existing end-to-end training setup. We demonstrate the capabilities of our approach on several data sets and show that it can improve performance on classification tasks up to state of the art, making use of additionally available unlabeled data. We also show how to apply this to the task of domain adaptation, surpassing current state-of-the-art results. View details