Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10081 publications
Solidarity not Charity! Empowering Local Communities for Disaster Relief during COVID-19 through Grassroots Support
Jeongwon Jo
Oluwafunke Alliyu
John M. Carroll
Computer Supported Cooperative Work (2024) (2024)
Preview abstract
The COVID-19 pandemic brought wide-ranging, unanticipated societal changes as communities rushed to slow the spread of the novel coronavirus. In response, mutual aid groups bloomed online across the United States to fill in the gaps in social services and help local communities cope with infrastructural breakdowns. Unlike many previous disasters, the long-haul nature of COVID-19 necessitates sustained disaster relief efforts. In this paper, we conducted an interview study with online mutual aid group administrators to understand how groups facilitated disaster relief, and how disaster relief initiatives developed and maintained over the course of the first year of COVID-19. Our findings suggest that the groups were crucial sources of community-based support for immediate needs, innovated long-term solutions for chronic community issues and grew into a vehicle for justice-centered work. Our insights shed light on the strength of mutual aid as a community capacity that can support communities to collectively be more prepared for future long-haul disasters than they were with COVID-19.
View details
Bridging the Gap: Unpacking the Hidden Challenges in Knowledge Distillation for Online Ranking Systems
Shuo Yang
Aniruddh Nath
Yang Liu
Li Wei
Shawn Andrews
Maciej Kula
Jarrod Kahn
Zhe Zhao
Lichan Hong
Preview abstract
Knowledge Distillation (KD) is a powerful approach for compressing large models into smaller, more efficient models, particularly beneficial for latency-sensitive applications like recommender systems. However, current KD research predominantly focuses on Computer Vision (CV) and NLP tasks, overlooking unique data characteristics and challenges inherent to recommender systems. This paper addresses these overlooked challenges, specifically: (1) mitigating data distribution shifts between teacher and student models, (2) efficiently identifying optimal teacher configurations within time and budgetary constraints, and (3) enabling computationally efficient and rapid sharing of teacher labels to support multiple students. We present a robust KD system developed and rigorously evaluated on multiple large-scale personalized video recommendation systems within Google. Our live experiment results demonstrate significant improvements in student model performance while ensuring the consistent and reliable generation of high-quality teacher labels from continuous data streams.
View details
Perspective Chapter: Assessment of Subjective and Objective Sleep Quality from Wrist-Worn Wearable Data
Ben Yetton
Daniel McDuff
Andrew Barakat
Allen Jiang
Nicholas Allen
Logan Schneider
Ari Winbush
Conor Heneghan
Preview abstract
Researchers are interested in measuring both objective and subjective assessments of sleep, and associated phenomena such as sleepiness, quality and restoration. Predicting perceived sleep quality accurately from objective measurements remains an unsolved and interesting problem. Previous studies using polysomnograms and actigraphy have shown poor concordance between objective metrics and subjective sleep quality, but were often limited by study duration (e.g., one or two nights of PSG, study population in low 100 s). In this chapter, we consider whether consumer sleep trackers could significantly improve the assessment of subjective sleep quality through longer periods of assessment and larger data scale. We describe a recent study that modeled two subjective sleep quality metrics (PROMIS Sleep-Related Impairment (SI) and Sleep Disturbance (SD) Index) from objective sleep metrics acquired from a consumer wearable device (Fitbit). However, the goodness-of-fit parameter remains relatively low, even with the increased data availability and scale of data provided by consumer wearables. Specifically, for a well-characterized normative population of 2106 adults, we see that a linear multivariate model produces an R2 of 0.107 for predicting SI and R2 of 0.147 for SR, consistent with prior results using PSG and actigraphy. We conclude that subjective sleep quality remains broadly a psychological construct that cannot be fully modeled solely by objective sleep metrics.
View details
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Michael Xieyang Liu
Krystal Kallarackal
Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '24), ACM (2024)
Preview abstract
Automatic side-by-side evaluation has emerged as a promising approach to evaluating the quality of responses from large language models (LLMs). However, analyzing the results from this evaluation approach raises scalability and interpretability challenges. In this paper, we present LLM Comparator, a novel visual analytics tool for interactively analyzing results from automatic side-by-side evaluation. The tool supports interactive workflows for users to understand when and why a model performs better or worse than a baseline model, and how the responses from two models are qualitatively different. We iteratively designed and developed the tool by closely working with researchers and engineers at Google. This paper details the user challenges we identified, the design and development of the tool, and an observational study with participants who regularly evaluate their models.
View details
Preview abstract
We propose a new Markov Decision Process (MDP) model for ad auctions to capture the
user response to the quality of ads, with the objective of maximizing the long-term discounted
revenue. By incorporating user response, our model takes into consideration all three parties
involved in the auction (advertiser, auctioneer, and user). The state of the user is modeled as a
user-specific click-through rate (CTR) with the CTR changing in the next round according to the
set of ads shown to the user in the current round. We characterize the optimal mechanism for this MDP as a Myerson’s auction with a notion of modified virtual value, which relies on the value distribution of the advertiser, the current user state, and the future impact of showing the ad to the user. Leveraging this characterization, we design a sample-efficient and computationally-efficient algorithm which outputs an approximately optimal policy that requires only sample access to the true MDP and the value distributions of the bidders. Finally, we propose a simple mechanism built upon second price auctions with personalized reserve prices and show it can achieve a constant-factor approximation to the optimal long term discounted revenue.
View details
Preview abstract
Use of Text-to-Image models is expanding beyond generating generic objects, as they are increasingly being adopted by diverse global communities to create visual representations of their unique culture. Current T2I benchmarks primarily evaluate image-text alignment, aesthetics and fidelity of generations for complex prompts with generic objects, overlooking the critical dimension of cultural understanding. In this work, we address this gap by defining a framework to evaluate cultural competence of T2I models, and present a scalable approach to collect cultural artifacts unique to a particular culture from Knowledge Graphs and Large Language Models in tandem. We assess the ability of state-of-the-art T2I models to generate culturally faithful and realistic images across 8 countries and 3 cultural domains. Furthermore, we emphasize the importance of T2I models reflecting a culture's diversity and introduce cultural diversity as a novel metric for T2I evaluation, drawing inspiration from the Vendi Score. We introduce T2I-GCube, a first-of-its-kind benchmark for T2I evaluation. T2I-GCube includes cultural prompts, metrics, and cultural concept spaces, enabling comprehensive assessment of T2I models' cultural knowledge and diversity. Our evaluations reveal significant gaps in the cultural knowledge of existing models and provide valuable insights into the diversity of image outputs for under-specified prompts. By introducing a novel approach to evaluating cultural diversity and knowledge in T2I models, T2I-GCube will be instrumental in fostering the development of models with enhanced cultural competence.
View details
Generalizing Tree-Level Sap Flow Across the European Continent
Ralf Loritz
Chen Huan Wu
Daniel Klotz
Martin Gauch
Frederik Kratzert
Maoya Bassiouni
Geophysical Research Letters (2024)
Preview abstract
Sap flow offers key insights about transpiration dynamics and forest-climate interactions. Accurately simulating sap flow remains challenging due to measurement uncertainties and interactions between global and local environmental controls. Addressing these complexities, this study leveraged Long Short-Term Memory networks (LSTMs) with SAPFLUXNET to predict hourly tree-level sap flow across Europe. We built models with diverse training sets to assess performance under previously unseen conditions. The average Kling-Gupta Efficiency was 0.77 for models trained on 50% of time series across all forest stands, and 0.52 for models trained on 50% of the forest stands. Continental models not only matched but surpassed the performance of specialized and baselines for all genera and forest types, showcasing the capacity of LSTMs to effectively generalize across tree genera, climates, and forest ecosystems given minimal inputs. This study underscores the potential of LSTMs in generalizing state-dependent ecohydrological processes and bridging tree level measurements to continental scales.
View details
Preview abstract
Service providers of large language model (LLM) applications collect user instructions in the wild and use them in further aligning LLMs with users’ intentions. These instructions, which potentially contain sensitive information, are annotated by human workers in the process. This poses a new privacy risk not addressed by the typical private optimization. To this end, we propose using synthetic instructions to replace real instructions in data annotation and model fine-tuning. Formal differential privacy is guaranteed by generating those synthetic instructions using privately fine-tuned generators. Crucial in achieving the desired utility is our novel filtering algorithm that matches the distribution of the synthetic instructions to that of the real ones. In both supervised fine-tuning and reinforcement learning from human feedback, our extensive experiments demonstrate the high utility of the final set of synthetic instructions by showing comparable results to real instructions. In supervised fine-tuning, models trained with private synthetic instructions outperform leading open-source models such as Vicuna
View details
An artificial neural network to estimate the foliar and ground cover input variables of the Rangeland Hydrology and Erosion Model
Mahmoud Saeedimoghaddam
David Goodrich
Mariano Hernandez
David Phillip Guertin
Loretta J. Metz
Guillermo Ponce-Campos
Haiyan Wei
Shea Burns
Sarah E. McCord
Mark A. Nearing
C. Jason Williams
Carrie-Ann Houdeshell
Mashrekur Rahman
Menberu B. Meles
Steve Barker
Journal of Hydrology (2024)
Preview abstract
Models like the Rangeland Hydrology and Erosion Model (RHEM) are useful for estimating soil erosion, however, they rely on input parameters that are sometimes difficult or expensive to measure. Specifically, RHEM requires information about foliar and ground cover fractions that generally must be measured in situ, which makes it difficult to use models like RHEM to produce erosion or soil risk maps for areas exceeding the size of a hillslope such as a large watershed. We previously developed a deep learning emulator of RHEM that has low computational expense and can, in principle, be run over large areas (e.g., over the continental US). In this paper, we develop a deep learning model to estimate the RHEM ground cover inputs from remote sensing time series, reducing the need for extensive field surveys to produce erosion maps. We achieve a prediction accuracy on hillslope runoff of r2=0.9, and on soil loss and sediment yield of r2 = 0.4 at 66,643 field locations within the US. We demonstrate how this approach can be used for mapping by developing runoff, soil loss, and sediment yield maps over a 1356 km2 region of interest in Nebraska.
View details
Understanding Use Cases for AI-Powered Visual Interpretation Services
Ricardo Gonzalez
Jazmin Collins
Shiri Azenkot
CHI Conference on Human-Computer Interaction (2024)
Preview abstract
"Scene description" applications that describe visual content in a photo are useful daily tools for blind and low vision (BLV) people. Researchers have
studied their use, but they have only explored those that leverage remote sighted assistants; little is known about applications that use AI to generate
their descriptions. Thus, to investigate their use cases, we conducted a two-week diary study where 16 BLV participants used an AI-powered scene description
application we designed. Through their diary entries and follow-up interviews, users shared their information goals and assessments of the visual descriptions
they received. We analyzed the entries and found frequent use cases, such as identifying visual features of known objects, and surprising ones, such as avoiding contact with dangerous objects. We also found users scored the descriptions relatively low on average,
2.76 out of 5 (SD=1.49) for satisfaction and 2.43 out of 4 (SD=1.16) for trust, showing that descriptions still need signifcant improvements to deliver
satisfying and trustworthy experiences. We discuss future opportunities for AI as it becomes a more powerful accessibility tool for BLV users.
View details
Stable quantum-correlated many-body states through engineered dissipation
Xiao Mi
Alexios Michailidis
Sara Shabani
Jerome Lloyd
Rajeev Acharya
Igor Aleiner
Trond Andersen
Markus Ansmann
Frank Arute
Kunal Arya
Juan Atalaya
Gina Bortoli
Alexandre Bourassa
Leon Brill
Michael Broughton
Bob Buckley
Tim Burger
Nicholas Bushnell
Jimmy Chen
Benjamin Chiaro
Desmond Chik
Charina Chou
Josh Cogan
Roberto Collins
Paul Conner
William Courtney
Alex Crook
Ben Curtin
Alejo Grajales Dau
Dripto Debroy
Agustin Di Paolo
ILYA Drozdov
Andrew Dunsworth
Lara Faoro
Edward Farhi
Reza Fatemi
Vinicius Ferreira
Ebrahim Forati
Brooks Foxen
Élie Genois
William Giang
Dar Gilboa
Raja Gosula
Steve Habegger
Michael Hamilton
Monica Hansen
Sean Harrington
Paula Heu
Markus Hoffmann
Trent Huang
Ashley Huff
Bill Huggins
Sergei Isakov
Justin Iveland
Cody Jones
Pavol Juhas
Kostyantyn Kechedzhi
Marika Kieferova
Alexei Kitaev
Andrey Klots
Alexander Korotkov
Fedor Kostritsa
John Mark Kreikebaum
Dave Landhuis
Pavel Laptev
Kim Ming Lau
Lily Laws
Joonho Lee
Kenny Lee
Yuri Lensky
Alexander Lill
Wayne Liu
Orion Martin
Amanda Mieszala
Shirin Montazeri
Alexis Morvan
Ramis Movassagh
Wojtek Mruczkiewicz
Charles Neill
Ani Nersisyan
Michael Newman
JiunHow Ng
Murray Ich Nguyen
Tom O'Brien
Alex Opremcak
Andre Petukhov
Rebecca Potter
Leonid Pryadko
Charles Rocque
Negar Saei
Kannan Sankaragomathi
Henry Schurkus
Christopher Schuster
Mike Shearn
Aaron Shorter
Noah Shutty
Vladimir Shvarts
Jindra Skruzny
Clarke Smith
Rolando Somma
George Sterling
Doug Strain
Marco Szalay
Alfredo Torres
Guifre Vidal
Cheng Xing
Jamie Yao
Ping Yeh
Juhwan Yoo
Grayson Young
Yaxing Zhang
Ningfeng Zhu
Jeremy Hilton
Anthony Megrant
Yu Chen
Vadim Smelyanskiy
Dmitry Abanin
Science, 383 (2024), pp. 1332-1337
Preview abstract
Engineered dissipative reservoirs have the potential to steer many-body quantum systems toward correlated steady states useful for quantum simulation of high-temperature superconductivity or quantum magnetism. Using up to 49 superconducting qubits, we prepared low-energy states of the transverse-field Ising model through coupling to dissipative auxiliary qubits. In one dimension, we observed long-range quantum correlations and a ground-state fidelity of 0.86 for 18 qubits at the critical point. In two dimensions, we found mutual information that extends beyond nearest neighbors. Lastly, by coupling the system to auxiliaries emulating reservoirs with different chemical potentials, we explored transport in the quantum Heisenberg model. Our results establish engineered dissipation as a scalable alternative to unitary evolution for preparing entangled many-body states on noisy quantum processors.
View details
Beyond SOT: Tracking Multiple Generic Objects at Once
Christoph Mayer
Martin Danelljan
Vittorio Ferrari
Luc Van Gool
WACV'24 (2024)
Preview abstract
Generic Object Tracking (GOT) is the problem of tracking target objects, specified by bounding boxes in the first frame of a video. While the task has received much attention in the last decades, researchers have almost exclusively focused on the single object setting. However multiobject GOT poses its own challenges and is more attractive in real-world applications. We attribute the lack of research interest into this problem to the absence of suitable benchmarks. In this work, we introduce a new largescale GOT benchmark, LaGOT, containing multiple annotated target objects per sequence. Our benchmark allows users to tackle key remaining challenges in GOT, aiming to increase robustness and reduce computation through joint
tracking of multiple objects simultaneously. In addition, we propose a transformer-based GOT tracker baseline capable of joint processing of multiple objects through shared computation. Our approach achieves a 4× faster run-time in case of 10 concurrent objects compared to tracking each object independently and outperforms existing single object trackers on our new benchmark. In addition, our approach achieves highly competitive results on single-object GOT datasets, setting a new state of the art on TrackingNet with a success rate AUC of 84.4%. Our benchmark, code, results and trained models are available at https://github.com/visionml/pytracking.
View details
Preview abstract
Neural embedding models have become a fundamental component of modern information retrieval (IR) pipelines. These models produce a single embedding x ∈ R^d per data-point, allowing for fast retrieval via highly optimized maximum inner product search (MIPS) algorithms. Recently, beginning with the landmark ColBERT paper, multi-vector models, which produce a set of embedding per data point, have achieved markedly superior performance for IR tasks. Unfortunately, using these models for IR is computationally expensive due to the increased complexity of multi-vector retrieval and scoring.
In this paper, we introduce MUVERA (Multi-Vector Retrieval Algorithm), a retrieval mechanism which reduces multi-vector similarity search to single-vector similarity search. This enables the usage of off-the-shelf MIPS solvers for multi-vector retrieval. MUVERA asymmetrically generates Fixed Dimensional Encodings (FDEs) of queries and documents, which are vectors whose inner product approximates multi-vector similarity. We prove that FDEs give high-quality ε-approximations, thus providing the first single-vector proxy for multi-vector similarity with theoretical guarantees. Empirically, we find that FDEs achieve the same recall as prior state-of-the-art heuristics while retrieving 2-5× fewer candidates. Compared to prior state of the art implementations, MUVERA achieves consistently good end-to-end recall and latency across a diverse set of the BEIR retrieval datasets, achieving an average of 10% improved recall with 90% lower latency.
View details
LinguaMeta: Unified Metadata for Thousands of Languages
Uche Okonkwo
Emily Drummond
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Preview abstract
We introduce LinguaMeta, a unified resource for language metadata for thousands of languages, including language codes, names, number of speakers, writing systems, countries, official status, coordinates, and language varieties. The resources are drawn from various existing repositories and supplemented with our own research. Each data point is tagged for its origin, allowing us to easily trace back to and improve existing resources with more up-to-date and complete metadata. The resource is intended for use by researchers and organizations who aim to extend technology to thousands of languages.
View details
Using Early Readouts to Mediate Featural Bias in Distillation
Rishabh Tiwari
Durga Sivasubramanian
Anmol Mekala
Ganesh Ramakrishnan
WACV 2024 (2024)
Preview abstract
Deep networks tend to learn spurious feature-label correlations in real-world supervised learning tasks. This vulnerability is aggravated in distillation, where a (student) model may have less representational capacity than the corresponding teacher model. Often, knowledge of specific problem features is used to reweight instances & rebalance the learning process. We propose a novel early readout mechanism whereby we attempt to predict the label using representations from earlier network layers. We show that these early readouts automatically identify problem instances or groups in the form of confident, incorrect predictions. We improve group fairness measures across benchmark datasets by leveraging these signals to mediate between teacher logits and supervised label. We extend our results to the closely related but distinct problem of domain generalization, which also critically depends on the quality of learned features. We provide secondary analyses that bring insight into the role of feature learning in supervision and distillation.
View details