Ettore Randazzo
Software engineer researching fundamental problems about artificial intelligence and life.
Research Areas
Authored Publications
Sort By
Transformers learn in-context by gradient descent
João Sacramento
International Conference on Machine Learning (2023), pp. 35151-35174
Preview abstract
Transformers have become the state-of-the-art neural network architecture across numerous
domains of machine learning. This is partly due to their celebrated ability to transfer and
to learn in-context based on a few examples. Nevertheless, the mechanism of why and
how Transformers become in-context learners is not well understood and remains mostly an
intuition. Here, we argue that training Transformers on auto-regressive tasks can be closely
related to well-known gradient-based meta-learning formulations. We do so by providing
a simple construction that shows the equivalence of data transformations induced by 1) a
single linear self-attention layer and by 2) gradient-descent on a regression loss. Motivated by
that construction, we show empirically that when training self-attention only Transformers
on simple regression tasks either the models learned by GD and Transformers show great
similarity or, remarkably, the solutions found by gradient descent converge in weight space to
our construction. This allows us, at least on our simple regression tasks, to mechanistically
understand the inner workings of Transformers that enables in-context learning within.
Finally, we discuss intriguing parallels to a mechanism identified as crucial for in-context
learning termed induction-head (Olsson et al., 2022) and show how it could be generalized
by in-context learning by gradient descent within Transformers.
View details
Preview abstract
We present a Message Passing based Learning Protocol (MPLP) for artificial neural networks. With this protocol, every synapse (weights and biases), and activation is considered an independent agent, responsible for ingesting incoming messages, updating their own states, and outputting n-dimensional messages for their neighbours. We show how this protocol can be used instead of a traditional gradient-based approach for traditional feed-forward neural networks, and present a framework capable of generalizing neural networks to explore more flexible architectures. We meta-learn the MPLP through end-to-end gradient-based meta-optimisation. Finally, we discuss where the strengths of MPLP lay, and where we foresee possible limitations.
View details
Self-classifying MNIST digit CA
Alexander Mordvintsev {{ +moralex }
Michael Levin
Sam Greydanus
Distill (2020)
Preview abstract
Training an end-to-end differentiable, self-organising cellular automata model able to self-classify in ever-changing MNIST digits.
View details
Preview abstract
Training an end-to-end differentiable, self-organising cellular automata model of morphogenesis, able to both grow and regenerate specific patterns.
View details