Anselm Levskaya

Anselm Levskaya

I studied physics at Cornell, biophysics at UCSF, and neuroscience at Stanford. At UCSF I was involved in early work in optogenetics with Chris Voigt and Wendell Lim, engineering light-sensitive proteins for the direct control of intracellular signaling using patterned light controlled with computational microscopy. At Stanford I worked on light-field microscopy with the labs of Karl Deisseroth and Marc Levoy, developing this technique for optogenetic experiments in zebrafish and mice. In industry, I founded a startup that developed a high-throughput DNA synthesis pipeline involving high-throughput single molecule cloning, next-gen sequence verification and physical selection by lasers for radically reducing error rates. I was one of the first employees at Cell Design Labs (acquired by Gilead), which developed next-generation engineered T-cell reagents for fighting blood cancers. There I developed novel synthetic notch receptors for direct contact cell-cell antigen sensing. Additionally, I’ve worked with startups applying deep learning to medical diagnostics and phenotypic screening. I'm broadly interested in machine learning systems for assisting in the analysis and engineering of organisms, cells and and biological circuits, especially in moving beyond the data-poor, intuition-driven “artisanal” engineering approaches typical of existing biomedical projects. I believe we can leverage rich new biological data sources (high throughput imaging, sequencing, high-dimensional cytometry, etc.) via deep-learning approaches to one day accelerate the development cycle of therapeutics and diagnostics. I’m additionally interested in low level software infrastructure for deep learning and more academic aspects of representation learning and generative models over images and sequences.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    PaLM: Scaling Language Modeling with Pathways
    Aakanksha Chowdhery
    Sharan Narang
    Jacob Devlin
    Maarten Bosma
    Hyung Won Chung
    Sebastian Gehrmann
    Parker Schuh
    Sasha Tsvyashchenko
    Abhishek Rao
    Yi Tay
    Noam Shazeer
    Nan Du
    Reiner Pope
    James Bradbury
    Guy Gur-Ari
    Toju Duke
    Henryk Michalewski
    Xavier Garcia
    Liam Fedus
    David Luan
    Barret Zoph
    Ryan Sepassi
    David Dohan
    Shivani Agrawal
    Mark Omernick
    Marie Pellat
    Aitor Lewkowycz
    Erica Moreira
    Rewon Child
    Oleksandr Polozov
    Zongwei Zhou
    Brennan Saeta
    Michele Catasta
    Jason Wei
    Kathy Meier-Hellstern
    arxiv:2204.02311 (2022)
    Preview abstract Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies. View details
    Studying Stand-Alone Self-Attention in Vision Models
    Prajit Ramachandran
    Niki Parmar
    Ashish Vaswani
    Irwan Bello
    Jon Shlens
    Neurips (2019) (to appear)
    Preview abstract Convolutions are a fundamental building block of modern computer vision systems. Recent approaches have argued for going beyond convolutions in order to capture long-range dependencies. These efforts focus on augmenting convolutional models with content-based interactions, such as self-attention and non-local means, to achieve gains on a number of vision tasks. The natural question that arises is whether attention can be a standalone primitive for vision models instead of serving as just an augmentation on top of convolutions. In developing and testing a pure self-attention vision model, we verify that self-attention can indeed be an effective standalone layer. A simple procedure of replacing all instances of spatial convolutions with a form of self-attention to ResNet-50 produces a fully self-attentional model that outperforms the baseline on ImageNet classification with 12% fewer FLOPS and 29% fewer parameters. On COCO object detection, a fully self-attention model matches the mAP of a baseline RetinaNet while having 39% fewer FLOPS and 34% fewer parameters. Detailed ablation studies demonstrate that self-attention is especially impactful when used in later layers. These results establish that standalone self-attention is an important addition to the vision practitioner’s toolbox. View details