Emily Reif
I'm a software engineer on Google's People and AI Research team. I make tools for researchers, students, laypeople and other end users to better understand the ML models that are now ubiquitous in our lives. A short list of these projects includes the Embedding Projector, this recent paper on interpretability for language models, the Waterfall of Meaning, and SMILY, a tool for pathologists.
Authored Publications
Sort By
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Minsuk Kahng
Michael Xieyang Liu
Krystal Kallarackal
Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '24), ACM (2024)
Preview abstract
Automatic side-by-side evaluation has emerged as a promising approach to evaluating the quality of responses from large language models (LLMs). However, analyzing the results from this evaluation approach raises scalability and interpretability challenges. In this paper, we present LLM Comparator, a novel visual analytics tool for interactively analyzing results from automatic side-by-side evaluation. The tool supports interactive workflows for users to understand when and why a model performs better or worse than a baseline model, and how the responses from two models are qualitatively different. We iteratively designed and developed the tool by closely working with researchers and engineers at Google. This paper details the user challenges we identified, the design and development of the tool, and an observational study with participants who regularly evaluate their models.
View details
Understanding the Dataset Practitioners Behind Large Language Models
Minsuk Kahng
Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '24), ACM, Honolulu, HI, USA (2024)
Preview abstract
As large language models (LLMs) become more advanced and impactful, it is increasingly important to scrutinize the data that they rely upon and produce. What is it to be a dataset practitioner doing this work? We approach this in two parts: first, we define the role of "dataset practitioners'' by performing a retrospective analysis on the responsibilities of teams contributing to LLM development at a technology company, Google. Then, we conduct semi-structured interviews with a cross-section of these practitioners (N=10). We find that although data quality is a top priority, there is little consensus around what data quality is and how to evaluate it. Consequently, practitioners either rely on their own intuition or write custom code to evaluate their data. We discuss potential reasons for this phenomenon and opportunities for alignment.
View details
Automatic Histograms: Leveraging Language Models for Text Dataset Exploration
Minsuk Kahng
Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '24), ACM, Honolulu, HI, USA (2024), pp. 9
Preview abstract
Making sense of unstructured text datasets is perennially difficult, yet increasingly relevant with Large Language Models. Data practitioners often rely on dataset summaries, especially distributions of various derived features. Some features, like toxicity or topics, are relevant to many datasets, but many interesting features are domain specific, e.g., instruments and genres for a music dataset, or diseases and symptoms for a medical dataset. Accordingly, data practitioners often run custom analyses for each dataset, which is cumbersome and difficult, or use unsupervised methods. We present AutoHistograms, a visualization tool leveraging LLMs. AutoHistograms automatically identifies relevant entity-based features, visualizes their distributions, and allows the user to interactively query the dataset for new categories of entities. In a user study with (n=10) data practitioners, we observe that participants were able to quickly onboard to AutoHistograms, use the tool to identify actionable insights, and conceptualize a broad range of applicable use cases. We also describe a variety of usage scenarios from different types of users to highlight how this app can provide value in many different contexts. Finally, we present a quantitative evaluation of the tool. Together, this tool and user study contribute to the growing field of LLM-assisted sensemaking tools.
View details
Who’s asking? User personas and the mechanics of latent misalignment
Ann Yuan
Marius Guerard
Michael Lepori
NeurIPS 2024 (2024)
Preview abstract
Why do models respond to harmful queries in some cases but not others?
Despite significant investments in improving model safety, it has been shown that misaligned capabilities remain latent in safety-tuned models. In this work, we shed light on the mechanics of this phenomenon. First, we show that even when model generations are safe, harmful content persists in hidden representations, and this content can be extracted by decoding from earlier layers. Then, we show that whether the model divulges such content depends significantly on who it is talking to, which we refer to as user persona. We study both natural language prompting and activation steering as methods for manipulating inferred user persona and show that the latter is significantly more effective at bypassing safety filters. In fact, we find it is even more effective than direct attempts to control a model's refusal tendency.
This suggests when it comes to deciding whether to respond to harmful queries, the model is deeply biased with respect to user persona. We leverage the generative capabilities of the language model itself to investigate why certain personas break model safeguards, and discover that they enable the model to form more charitable interpretations of otherwise dangerous queries. Finally, we show that we can predict a persona’s effect on refusal given only the geometry of its steering vector.
View details
Preview abstract
An Explorable explaining the concept of patchoscopes for an external audience. Patchoscopes is an interpretability tool that allows researchers to better understand an LLMs output representations through natural language experiments.
View details
Preview abstract
An Explorable explaining the concept of patchoscopes for an external audience. Patchoscopes is an interpretability tool that allows researchers to better understand an LLMs output representations through natural language experiments.
View details
Visualizing Linguistic Diversity of Text Datasets Synthesized by Large Language Models
Minsuk Kahng
IEEE Conference on Visualization and Visual Analytics (VIS), IEEE (2023)
Preview abstract
Large language models (LLMs) can be used to generate smaller, more refined datasets via few-shot prompting for benchmarking, fine-tuning or other use cases. However, understanding and evaluating these datasets is difficult, and the failure modes of LLM-generated data are still not well understood. Specifically, the data can be repetitive in surprising ways, not only semantically but also syntactically and lexically. We present LinguisticLens, a novel interactive visualization tool for making sense of and analyzing syntactic diversity of LLM-generated datasets. LinguisticLens clusters text along syntactic, lexical, and semantic axes. It supports hierarchical visualization of a text dataset, allowing users to quickly scan for an overview and inspect individual examples. The live demo is available at https://shorturl.at/zHOUV.
View details
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
Hyung Won Chung
Sebastian Gehrmann
Parker Schuh
Sasha Tsvyashchenko
Abhishek Rao
Yi Tay
Noam Shazeer
Nan Du
Reiner Pope
James Bradbury
Jacob Austin
Guy Gur-Ari
Pengcheng Yin
Toju Duke
Henryk Michalewski
Xavier Garcia
Liam Fedus
David Luan
Barret Zoph
Ryan Sepassi
David Dohan
Shivani Agrawal
Mark Omernick
Andrew M. Dai
Marie Pellat
Aitor Lewkowycz
Erica Moreira
Rewon Child
Oleksandr Polozov
Katherine Lee
Zongwei Zhou
Brennan Saeta
Michele Catasta
Jason Wei
Kathy Meier-Hellstern
arxiv:2204.02311 (2022)
Preview abstract
Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.
View details
Preview abstract
Neural networks have been adapted to leverage the structure and properties of graphs. We explore the components needed for building a graph neural network - and motivate the design choices behind them.
View details
The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models
Tolga Bolukbasi
Andy Coenen
Sebastian Gehrmann
Ellen Jiang
Carey Radebaugh
Ann Yuan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics (to appear)
Preview abstract
We present the Language Interpretability Tool (LIT), an open-source platform for visualization and understanding of NLP models. We focus on core questions about model behavior: Why did my model make this prediction? When does it perform poorly? What happens under a controlled change in the input? LIT integrates local explanations, aggregate analysis, and counterfactual generation into a streamlined, browser-based interface to enable rapid exploration and error analysis. We include case studies for a diverse set of workflows, including exploring counterfactuals for sentiment analysis, measuring gender bias in coreference systems, and exploring local behavior in text generation. LIT supports a wide range of models--including classification, seq2seq, and structured prediction--and is highly extensible through a declarative, framework-agnostic API. LIT is under active development, with code and full documentation available at https://github.com/pair-code/lit.
View details