Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10129 publications
    Preview abstract Structured Complex Task Decomposition (SCTD) is the problem of breaking down a complex real-world task (such as planning a wedding) into a directed acyclic graph over individual steps that contribute to achieving the task, with edges specifying temporal dependencies between them. SCTD is an important component of assistive planning tools, and a challenge for commonsense reasoning systems. We probe how accurately SCTD can be done with the knowledge extracted from Large Language Models (LLMs). We introduce a high-quality human-annotated dataset for this problem and novel metrics to fairly assess performance of LLMs against several baselines. Our experiments reveal that LLMs are able to decompose complex tasks into individual steps effectively, with a relative improvement of 15% to 280% over the best baseline. We also propose a number of approaches to further improve their performance, with a relative improvement of 7% to 37% over the base model. However, we find that LLMs still struggle to predict pairwise temporal dependencies, which reveals a gap in their understanding of complex tasks. View details
    Preview abstract We focus on the problem of learning without forgetting from multiple tasks arriving sequentially, where each task is defined using a few-shot episode of novel or already seen classes. We approach this problem using the recently published HyperTransformer (HT), a Transformer-based hypernetwork that generates specialized task-specific CNN weights directly from the support set. In order to learn from a continual sequence of tasks, we propose to recursively re-use the generated weights as input to the HT for the next task. This way, the generated CNN weights themselves act as a representation of previously learned tasks, and the HT is trained to update these weights so that the new task can be learned without forgetting past tasks. This approach is different from most continual learning algorithms that typically rely on using replay buffers, weight regularization or task-dependent architectural changes. We demonstrate that our proposed Continual HyperTransformer method equipped with a prototypical loss is capable of learning and retaining knowledge about past tasks for a variety of scenarios, including learning from mini-batches, and task-incremental and class-incremental learning scenarios. View details
    Preview abstract There is a potential future where the content created by a human and an AI are indistinguishable. In this future, if you can’t tell the difference, does it matter? We conducted a 3 (Assigned creator: human, human with AI assistance, AI) by 4 (Context: news, travel, health, and jokes) mixed-design experiment where participants evaluated human-written content that was presented as created by a human, a human with AI assistance, or an AI. We found that participants felt more negatively about the content creator and were less satisfied when they thought AI was used, but assigned creator had no effect on content judgments. We also identified five interpretations for how participants thought AI use affected the content creation process. Our work suggests that informing users about AI use may not have the intended effect of helping consumers make content judgments and may instead damage the relationship between creators and followers. View details
    Preview abstract Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. These predictions can then be deferred to humans for further evaluation. As an everlasting challenge for machine learning, in many real-world scenarios, the distribution of test data is different from the training data. This results in more inaccurate predictions, and often increased dependence on humans, which can be difficult and expensive. Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples. Selective prediction and active learning have been approached from different angles, with the connection between them missing. In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain while increasing accuracy and coverage. For this new paradigm, we propose a simple yet effective approach, ASPEST, that utilizes ensembles of model snapshots with self-training with their aggregated outputs as pseudo labels. Extensive experiments on numerous image, text and structured datasets, which suffer from domain shifts, demonstrate that ASPEST can significantly outperform prior work on selective prediction and active learning (e.g. on the MNIST→SVHN benchmark with the labeling budget of 100, ASPEST improves the AUACC metric from 79.36% to 88.84%) and achieves more optimal utilization of humans in the loop. View details
    Complex Dynamics in Autobidding Systems
    Georgios Piliouras
    Kelly Spendlove
    Proceedings of the 25th ACM Conference on Economics and Computation (2024)
    Preview abstract It has become the default in markets such as ad auctions for participants to bid in an auction through automated bidding agents (autobidders) which adjust bids over time to satisfy return-over-spend constraints. Despite the prominence of such systems for the internet economy, their resulting dynamical behavior is still not well understood. Although one might hope that such relatively simple systems would typically converge to the equilibria of their underlying auctions, we provide a plethora of results that show the emergence of complex behavior, such as bi-stability, periodic orbits and quasi periodicity. We empirically observe how the market structure (expressed as motifs) qualitatively affects the behavior of the dynamics. We complement it with theoretical results showing that autobidding systems can simulate both linear dynamical systems as well logical boolean gates. View details
    Assessing Web Fingerprinting Risk
    Robert Busa-Fekete
    Antonio Sartori
    Proceedings of the ACM Web Conference (WWW 2024)
    Preview abstract Modern Web APIs allow developers to provide extensively customized experiences for website visitors, but the richness of the device information they provide also make them vulnerable to being abused by malign actors to construct browser fingerprints, device-specific identifiers that enable covert tracking of users even when cookies are disabled. Previous research has established entropy, a measure of information, as the key metric for quantifying fingerprinting risk. Earlier studies that estimated the entropy of Web APIs were based on data from a single website or were limited to an extremely small sample of clients. They also analyzed each Web API separately and then summed their entropies to quantify overall fingerprinting risk, an approach that can lead to gross overestimates. We provide the first study of browser fingerprinting which addresses the limitations of prior work. Our study is based on actual visited pages and Web API function calls reported by tens of millions of real Chrome browsers in-the-wild. We accounted for the dependencies and correlations among Web APIs, which is crucial for obtaining more realistic entropy estimates. We also developed a novel experimental design that accurately estimates entropy while never observing too much information from any single user. Our results provide an understanding of the distribution of entropy for different website categories, confirm the utility of entropy as a fingerprinting proxy, and offer a method for evaluating browser enhancements which are intended to mitigate fingerprinting. View details
    Stable quantum-correlated many-body states through engineered dissipation
    Xiao Mi
    Alexios Michailidis
    Sara Shabani
    Jerome Lloyd
    Rajeev Acharya
    Igor Aleiner
    Trond Andersen
    Markus Ansmann
    Frank Arute
    Kunal Arya
    Juan Atalaya
    Gina Bortoli
    Alexandre Bourassa
    Leon Brill
    Michael Broughton
    Bob Buckley
    Tim Burger
    Nicholas Bushnell
    Jimmy Chen
    Benjamin Chiaro
    Desmond Chik
    Charina Chou
    Josh Cogan
    Roberto Collins
    Paul Conner
    William Courtney
    Alex Crook
    Ben Curtin
    Alejo Grajales Dau
    Dripto Debroy
    Agustin Di Paolo
    ILYA Drozdov
    Andrew Dunsworth
    Lara Faoro
    Edward Farhi
    Reza Fatemi
    Vinicius Ferreira
    Ebrahim Forati
    Brooks Foxen
    Élie Genois
    William Giang
    Dar Gilboa
    Raja Gosula
    Steve Habegger
    Michael Hamilton
    Monica Hansen
    Sean Harrington
    Paula Heu
    Markus Hoffmann
    Trent Huang
    Ashley Huff
    Bill Huggins
    Sergei Isakov
    Justin Iveland
    Cody Jones
    Pavol Juhas
    Kostyantyn Kechedzhi
    Marika Kieferova
    Alexei Kitaev
    Andrey Klots
    Alexander Korotkov
    Fedor Kostritsa
    John Mark Kreikebaum
    Dave Landhuis
    Pavel Laptev
    Kim Ming Lau
    Lily Laws
    Joonho Lee
    Kenny Lee
    Yuri Lensky
    Alexander Lill
    Wayne Liu
    Orion Martin
    Amanda Mieszala
    Shirin Montazeri
    Alexis Morvan
    Ramis Movassagh
    Wojtek Mruczkiewicz
    Charles Neill
    Ani Nersisyan
    Michael Newman
    JiunHow Ng
    Murray Ich Nguyen
    Tom O'Brien
    Alex Opremcak
    Andre Petukhov
    Rebecca Potter
    Leonid Pryadko
    Charles Rocque
    Negar Saei
    Kannan Sankaragomathi
    Henry Schurkus
    Christopher Schuster
    Mike Shearn
    Aaron Shorter
    Noah Shutty
    Vladimir Shvarts
    Jindra Skruzny
    Clarke Smith
    Rolando Somma
    George Sterling
    Doug Strain
    Marco Szalay
    Alfredo Torres
    Guifre Vidal
    Cheng Xing
    Jamie Yao
    Ping Yeh
    Juhwan Yoo
    Grayson Young
    Yaxing Zhang
    Ningfeng Zhu
    Jeremy Hilton
    Anthony Megrant
    Yu Chen
    Vadim Smelyanskiy
    Dmitry Abanin
    Science, 383 (2024), pp. 1332-1337
    Preview abstract Engineered dissipative reservoirs have the potential to steer many-body quantum systems toward correlated steady states useful for quantum simulation of high-temperature superconductivity or quantum magnetism. Using up to 49 superconducting qubits, we prepared low-energy states of the transverse-field Ising model through coupling to dissipative auxiliary qubits. In one dimension, we observed long-range quantum correlations and a ground-state fidelity of 0.86 for 18 qubits at the critical point. In two dimensions, we found mutual information that extends beyond nearest neighbors. Lastly, by coupling the system to auxiliaries emulating reservoirs with different chemical potentials, we explored transport in the quantum Heisenberg model. Our results establish engineered dissipation as a scalable alternative to unitary evolution for preparing entangled many-body states on noisy quantum processors. View details
    Traffic simulations: multi-city calibration of metropolitan highway networks
    Yechen Li
    Damien Pierce
    27th IEEE International Conference on Intelligent Transportation Systems (ITSC) (2024)
    Preview abstract This paper proposes an approach to perform travel demand calibration for high-resolution stochastic traffic simulators. It employs abundant travel times at the path-level, departing from the standard practice of resorting to scarce segment-level sensor counts. The proposed approach is shown to tackle high-dimensional instances in a sample-efficient way. For the first time, case studies on 6 metropolitan highway networks are carried out, considering a total of 54 calibration scenarios. This is the first work to show the ability of a calibration algorithm to systematically scale across networks. Compared to the state-of-the-art simultaneous perturbation stochastic approximation (SPSA) algorithm, the proposed approach enhances fit to field data by an average 43.5% with a maximum improvement of 80.0%, and does so within fewer simulation calls. View details
    Believing Anthropomorphism: Examining the Role of Anthropomorphic Cues on User Trust in Large Language Models
    Michelle Cohn
    Femi Olanubi
    Zion Mengesha
    Daniel Padgett
    CM (Association of Computing Machinery) CHI conference on Human Factors in Computing Systems 2024 (2024)
    Preview abstract People now regularly interface with Large Language Models (LLMs) via speech and text (e.g., Bard) interfaces. However, little is known about the relationship between how users anthropomorphize an LLM system (i.e., ascribe human-like characteristics to a system) and how they trust the information the system provides. Participants (n=2,165; ranging in age from 18-90 from the United States) completed an online experiment, where they interacted with a pseudo-LLM that varied in modality (text only, speech + text) and grammatical person (“I” vs. “the system”) in its responses. Results showed that the “speech + text” condition led to higher anthropomorphism of the system overall, as well as higher ratings of accuracy of the information the system provides. Additionally, the first-person pronoun (“I”) led to higher information accuracy and reduced risk ratings, but only in one context. We discuss these findings for their implications for the design of responsible, human–generative AI experiences. View details
    Preview abstract Large Language Models (LLMs) may offer transformative opportunities for text input, especially for physically demanding modalities like handwriting. We studied a form of abbreviated handwriting by designing, developing and evaluating a prototype, named SkipWriter, that convert handwritten strokes of a variable-length, prefix- based abbreviation (e.g., “ho a y” as handwritten strokes) into the intended full phrase (e.g., “how are you” in the digital format) based on preceding context. SkipWriter consists of an in-production hand-writing recognizer and a LLM fine-tuned on this skip-writing task. With flexible pen input, SkipWriter allows the user to add and revise prefix strokes when predictions don’t match the user’s intent. An user evaluation demonstrated a 60% reduction in motor movements with an average speed of 25.78 WPM. We also showed that this reduction is close to the ceiling of our model in an offline simulation. View details
    Preview abstract We present XDTK, an open-source Unity/Android toolkit for prototyping multi-device interactions in extended reality (XR). With the Unity package and Android app provided in XDTK, data from any number of devices (phones, tablets, or wearables) can be streamed to and surfaced within a Unity-based XR application. ARCore-supported device also provide self-tracked pose data. Devices on the same local network are automatically discovered by the Unity server and their inputs are routed using a custom event framework. We designed XDTK to be modular and easily extendable to enable fast, simple, and effective prototyping of multi-device experiences by both researchers and developers. View details
    Solving olympiad geometry without human demonstrations
    Trieu Trinh
    Yuhuai Tony Wu
    He He
    Nature, 625 (2024), pp. 476-482
    Preview abstract Proving mathematical theorems at the olympiad level represents a notable milestone in human-level automated reasoning, owing to their reputed difficulty among the world’s best talents in pre-university mathematics. Current machine-learning approaches, however, are not applicable to most mathematical domains owing to the high cost of translating human proofs into machine-verifiable format. The problem is even worse for geometry because of its unique translation challenges, resulting in severe scarcity of training data. We propose AlphaGeometry, a theorem prover for Euclidean plane geometry that sidesteps the need for human demonstrations by synthesizing millions of theorems and proofs across different levels of complexity. AlphaGeometry is a neuro-symbolic system that uses a neural language model, trained from scratch on our large-scale synthetic data, to guide a symbolic deduction engine through infinite branching points in challenging problems. On a test set of 30 latest olympiad-level problems, AlphaGeometry solves 25, outperforming the previous best method that only solves ten problems and approaching the performance of an average International Mathematical Olympiad (IMO) gold medallist. Notably, AlphaGeometry produces human-readable proofs, solves all geometry problems in the IMO 2000 and 2015 under human expert evaluation and discovers a generalized version of a translated IMO theorem in 2004. View details
    Solidarity not Charity! Empowering Local Communities for Disaster Relief during COVID-19 through Grassroots Support
    Jeongwon Jo
    Oluwafunke Alliyu
    John M. Carroll
    Computer Supported Cooperative Work (2024) (2024)
    Preview abstract The COVID-19 pandemic brought wide-ranging, unanticipated societal changes as communities rushed to slow the spread of the novel coronavirus. In response, mutual aid groups bloomed online across the United States to fill in the gaps in social services and help local communities cope with infrastructural breakdowns. Unlike many previous disasters, the long-haul nature of COVID-19 necessitates sustained disaster relief efforts. In this paper, we conducted an interview study with online mutual aid group administrators to understand how groups facilitated disaster relief, and how disaster relief initiatives developed and maintained over the course of the first year of COVID-19. Our findings suggest that the groups were crucial sources of community-based support for immediate needs, innovated long-term solutions for chronic community issues and grew into a vehicle for justice-centered work. Our insights shed light on the strength of mutual aid as a community capacity that can support communities to collectively be more prepared for future long-haul disasters than they were with COVID-19. View details
    Preview abstract The emergence of synthetic data represents a pivotal shift in modern machine learning, offering a solution to satisfy the need for large volumes of data in domains where real data is scarce, highly private, or difficult to obtain. We investigate the feasibility of creating realistic, large-scale synthetic datasets of user-generated content, noting that such content is increasingly prevalent and a source of frequently sought information. Large language models (LLMs) offer a starting point for generating synthetic social media discussion threads, due to their ability to produce diverse responses that typify online interactions. However, as we demonstrate, straightforward application of LLMs yields limited success in capturing the complex structure of online discussions, and standard prompting mechanisms lack sufficient control. We therefore propose a multi-step generation process, predicated on the idea of creating compact representations of discussion threads, referred to as scaffolds. Our framework is generic yet adaptable to the unique characteristics of specific social media platforms. We demonstrate its feasibility using data from two distinct online discussion platforms. To address the fundamental challenge of ensuring the representativeness and realism of synthetic data, we propose a portfolio of evaluation measures to compare various instantiations of our framework. View details
    Preview abstract A vast amount of human discussion, storytelling, content creation, and reporting now occurs on social media platforms. As such, social media posts are often quoted on web pages as context. In this paper, we argue that these quotations and their surrounding page context provide a rich, platform-independent source of data for studying the intersection of natural language and social media. We introduce a taxonomy of quotation roles that categorizes how social media posts are used within content. We release a dataset of 38M social quotes derived from the Common Crawl, and role labels for a subset assessed by human raters. We show that the interplay of accounts, roles, and topics across the web graph reveal valuable social diffusion patterns, and that roles can be predicted with fine-tuned large language models from web context. View details