Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10132 publications
    Preview abstract Recent Text-to-Image (T2I) generation models such as Stable Diffusion and Imagen have made significant progress in generating high-resolution images based on text descriptions. However, many generated images still suffer from issues such as artifacts/implausibility, misalignment with text descriptions, and low aesthetic quality. Inspired by the success of Reinforcement Learning with Human Feedback (RLHF) for large language models, prior work collected human-provided scores as feedback on generated images and trained a reward model to improve the T2I generation. In this paper, we enrich the feedback signal by (i) marking image regions that are implausible or misaligned with the text, and (ii) annotating which keywords in the text prompt are not represented in the image. We collect such rich human feedback on 18K generated images and train a multimodal transformer to predict these rich feedback automatically. We show that the predicted rich human feedback can be leveraged to improve image generation, for example, by selecting high-quality training data to finetune and improve the generative models, or by creating masks with predicted heatmaps to inpaint the problematic regions. Notably, the improvements generalize to models (Muse) beyond those used to generate the images on which human feedback data were collected (Stable Diffusion variants). View details
    Preview abstract As AI systems quickly improve in both breadth and depth of performance, they lend themselves to creating increasingly powerful and realistic agents, including the possibility of agents modeled on specific people. We anticipate that within our lifetimes it may become common practice for people to create a custom AI agent to interact with loved ones and/or the broader world after death. We call these generative ghosts, since such agents will be capable of generating novel content rather than merely parroting content produced by their creator while living. In this paper, we first discuss the design space of potential implementations of generative ghosts. We then discuss the practical and ethical implications of generative ghosts, including potential positive and negative impacts on individuals and society. Based on these considerations, we lay out a research agenda for the AI and HCI research communities to empower people to create and interact with AI afterlives in a safe and beneficial manner. View details
    Pathfinder: High-Resolution Control-Flow Attacks with Conditional Branch Predictor
    Andrew Kwong
    Archit Agarwal
    Christina Garman
    Daniel Genkin
    Dean Tullsen
    Deian Stefan
    Hosein Yavarzadeh
    Max Christman
    Mohammadkazem Taram
    International Conference on Architectural Support for Programming Languages and Operating Systems, ACM (2024)
    Preview abstract This paper presents novel attack primitives that provide adversaries with the ability to read and write the path history register (PHR) and the prediction history tables (PHTs) of the conditional branch predictor in modern Intel CPUs. These primitives enable us to recover the recent control flow (the last 194 taken branches) and, in most cases, a nearly unlimited control flow history of any victim program. Additionally, we present a tool that transforms the PHR into an unambiguous control flow graph, encompassing the complete history of every branch. This work provides case studies demonstrating the practical impact of novel reading and writing/poisoning primitives. It includes examples of poisoning AES to obtain intermediate values and consequently recover the secret AES key, as well as recovering a secret image by capturing the complete control flow of libjpeg routines. Furthermore, we demonstrate that these attack primitives are effective across virtually all protection boundaries and remain functional in the presence of all recent control-flow mitigations from Intel. View details
    Embedding-Aligned Language Models
    Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS-24), Vancouver (2024)
    Preview abstract We propose a novel approach for training large language models (LLMs) to adhere to objectives imposed by a latent embedding space. Our method leverages reinforcement learning (RL), treating a pre-trained LLM as an environment. An Embedding-Aligned Guided LanguagE (EAGLE) agent it trained using a significantly smaller language model to iteratively stir the LLM's generation towards optimal regions of a latent embedding space, given some predefined criteria. We demonstrate the effectiveness of the EAGLE agent using the MovieLens 25M dataset, on extrapolation tasks for content gap to satisfy latent user demand, and multi-attribute satisfaction for generating creative variations of entities. Our work paves the way for controlled and grounded text generation using LLMs, ensuring consistency with domain-specific knowledge and data representations. View details
    Preview abstract In this paper we study users' opinions about the privacy of their mobile health apps. We look at what they write in app reviews in the 'Health & Fitness' category on the Google Play store. We identified 2832 apps in this category (based on 1K minimum installs). Using NLP/LLM analyses, we find that 76% of these apps have at least some privacy reviews. In total this yields over 164,000 reviews about privacy, from over 150 countries and in 25 languages. Our analyses identifies top themes and offers an approximation of how widespread these issues are around the world. We show that the top 4 themes - Data Sharing and Exposure, Permission Requests, Location Tracking and Data Collection - are issues of concern in over 70 countries. Our automatically generated thematic summaries reveal interesting aspects that deserve further research around user suspicions (unneeded data collection), user requests (more fine-grained control over data collection and data access), as well as user behavior (uninstalling apps). View details
    API Governance at Scale
    Mak Ahmad
    JJ Geewax
    David R Karger
    Kwan-Liu Ma
    ICSE 2024 Software Engineering in Practice (2024)
    Preview abstract API Governance, the process of applying standardized sets of policies and guardrails to the design and development of APIs, has only grown in importance and prominence given the continued growth in APIs being produced. In this paper, we present an Action Research style approach to investigate and understand the utility of a multi-faceted API Governance process being adopted inside Google. We first reflect on past research around API Governance, and then introduce three new components, 1. API Improvement Proposals (AIPs) the documented source of truth for API design rules, 2. API Linter, an automated analysis tool which checks for adherence to / violations of AIPs, and 3. API Readability, a program to educate and certify API design experts. These three components are designed to build upon pre-existing processes to scale and improve API design. Through a mixed-methods research strategy, containing both a survey and a series of interviews, we evaluate the utility of these approaches in supporting API Producers. Our research shows that API Producers have positive sentiment towards API Governance, validating the general direction of the program. Specifically, our study participants highlighted the positive impact of API Governance on the quality of the APIs they produced, via consistency in both the outcome and approach. This paper also discusses future research opportunities to enhance API Governance, specifically with regards to newer API Producers, who reported worse sentiment towards the program than their more experienced peers. View details
    Quantum Computation of Stopping power for Inertial Fusion Target Design
    Dominic Berry
    Alina Kononov
    Alec White
    Joonho Lee
    Andrew Baczewski
    Proceedings of the National Academy of Sciences, 121 (2024), e2317772121
    Preview abstract Stopping power is the rate at which a material absorbs the kinetic energy of a charged particle passing through it - one of many properties needed over a wide range of thermodynamic conditions in modeling inertial fusion implosions. First-principles stopping calculations are classically challenging because they involve the dynamics of large electronic systems far from equilibrium, with accuracies that are particularly difficult to constrain and assess in the warm-dense conditions preceding ignition. Here, we describe a protocol for using a fault-tolerant quantum computer to calculate stopping power from a first-quantized representation of the electrons and projectile. Our approach builds upon the electronic structure block encodings of Su et al. [PRX Quantum 2, 040332 2021], adapting and optimizing those algorithms to estimate observables of interest from the non-Born-Oppenheimer dynamics of multiple particle species at finite temperature. We also work out the constant factors associated with a novel implementation of a high order Trotter approach to simulating a grid representation of these systems. Ultimately, we report logical qubit requirements and leading-order Toffoli costs for computing the stopping power of various projectile/target combinations relevant to interpreting and designing inertial fusion experiments. We estimate that scientifically interesting and classically intractable stopping power calculations can be quantum simulated with roughly the same number of logical qubits and about one hundred times more Toffoli gates than is required for state-of-the-art quantum simulations of industrially relevant molecules such as FeMoCo or P450. View details
    SoothSayer: Bypassing DSAC Mitigation by Predicting Counter Replacement
    Salman Qazi
    Fourth Workshop on DRAM Security (DRAMSec) (2024)
    Preview abstract In-DRAM Stochastic and Approximate Counting (DSAC) is a recently published algorithm that aims to mitigate Rowhammer at low cost. Existing in-DRAM counter-based schemes keep track of row activations and issue Targeted Row Refresh (TRR) upon detecting a concerning pattern. However, due to insufficiency of the tracking ability they are vulnerable to attacks utilizing decoy rows. DSAC claims to improve upon existing TRR mitigation by filtering out decoy-row accesses, so they cannot saturate the limited number of counters available for detecting Rowhammer, promising a reliable mitigation without the area cost of deterministic and provable schemes such as per-row activation counting (PRAC). In this paper, we analyze DSAC and discover some gaps that make it vulnerable to Rowhammer and Rowpress attacks. The main focus of this work is a novel attack named SoothSayer that targets the counter replacement policy in DSAC by cloning the random number generator. We describe and simulate this attack, and establish its efficacy. Finally, we discuss other weaknesses in DSAC. View details
    Preview abstract Detecting offensive content in text is an increasingly central challenge for both social-media platforms and AI-driven technologies. However offensiveness remains a subjective phenomenon as perspectives differ across sociodemographic characteristics, as well as cultural norms and moral values. This intricacy is largely ignored in the current AI-focused approaches for detecting offensiveness or related concepts such as hate speech and toxicity detection. We frame the task of determining offensiveness as essentially a matter of moral judgment --- deciding the boundaries of ethically wrong vs. right language to be used or generated within an implied set of sociocultural norms. In this paper, we investigate how judgment of offensiveness varies across diverse global cultural regions, and the crucial role of moral values in shaping these variations. Our findings highlight substantial cross-cultural differences in perceiving offensiveness, with moral concerns about Caring and Purity as the mediating factor driving these differences. These insights are of importance as AI safety protocols, shaped by human annotators' inputs and perspectives, embed their moral values which do not align with the notions of right and wrong in all contexts, and for all individuals. View details
    Reinforcement Learning-Enhanced Cloud-Based Open Source Analog Circuit Generator for Standard and Cryogenic Temperatures in 130-nm and 180-nm OpenPDKs
    Ali Hammoud
    Anhang Li
    Ayushman Tripathi
    Wen Tian
    Harsh Khandeparkar
    Ryan Wans
    Boris Murmann
    Dennis Sylvester
    Mehdi Saligane
    Preview abstract This work introduces an open-source, Process Technology-agnostic framework for hierarchical circuit netlist, layout, and Reinforcement Learning (RL) optimization. The layout, netlist, and optimization python API is fully modular and publicly installable via PyPI. It features a bottom-up hierarchical construction, which allows for complete design reuse across provided PDKs. The modular hierarchy also facilitates parallel circuit design iterations on cloud platforms. To illustrate its capabilities, a two-stage OpAmp with a 5T first-stage, commonsource second-stage, and miller compensation is implemented. We instantiate the OpAmp in two different open-source process design kits (OpenPDKs) using both room-temperature models and cryogenic (4K) models. With a human designed version as the baseline, we leveraged the parameterization capabilities of the framework and applied the RL optimizer to adapt to the power consumption limits suitable for cryogenic applications while maintaining gain and bandwidth performance. Using the modular RL optimization framework we achieve a 6x reduction in power consumption compared to manually designed circuits while maintaining gain to within 2%. View details
    Preview abstract Welcome to the 16th edition of this column on recent books and journal articles in the field of public opinion, survey methods, survey statistics, Big Data, data science, and user experience research. Special issues of journals have a space in this article because, in our view, they are like edited books. We also added review papers from the journal series of Annual Reviews because these papers are seminal state of the art write ups, a mini book, if you wish on a specific subject. This article is an update of the books and journals published in the 2022 article. Like the previous year, the books are organized by topic; this should help the readers to focus on their interests. You will note that we use very broad definitions of public opinion, survey methods, survey statistics, Big Data, data science, and user experience research. This is because there are many books published in different outlets that can be very useful to the readers of Survey Practice, even if they do not come from traditional sources of survey content. It is unlikely we have exhaustively listed all new books in each subcategory; we did our best scouting different resources and websites, but we take full responsibility for any omissions. The list is also focused only on books published in the English language and available for purchase (as an ebook or in print) at the time of this review (April 2024) and with the printed copyright year of 2023. Books are listed based on the relevance to the topic, and no judgment is made in terms of quality of the content. We let the readers do so. If you want to send information for the next issue, please send it to surveypractice.new.books@gmail.com. View details
    Solving the wide-band inverse scattering problem via equivariant neural networks
    Borong Zhang
    Qin Li
    Journal of Computational and Applied Mathematics (2024)
    Preview abstract This paper introduces a novel deep neural network architecture for solving the inverse scattering problem in frequency domain with wide-band data, by directly approximating the inverse map, thus avoiding the expensive optimization loop of classical methods. The architecture is motivated by the filtered back-projection formula in the full aperture regime and with homogeneous background, and it leverages the underlying equivariance of the problem and compressibility of the integral operator. This drastically reduces the number of training parameters, and therefore the computational and sample complexity of the method. In particular, we obtain an architecture whose number of parameters scales sub-linearly with respect to the dimension of the inputs, while its inference complexity scales super-linearly but with very small constants. We provide several numerical tests that show that the current approach results in better reconstruction than optimization-based techniques such as full-waveform inversion, but at a fraction of the cost while being competitive with state-of-the-art machine learning methods. View details
    Preview abstract We propose Hierarchical Text Spotter (HTS), the first method for the joint task of word-level text spotting and geometric layout analysis. HTS can annotate text in images with a hierarchical representation of 4 levels: character, word, line, and paragraph. The proposed HTS is characterized by two novel components: (1) a Unified-Detector-Polygon (UDP) that produces Bezier Curve polygons of text lines and an affinity matrix for paragraph grouping between detected lines; (2) a Line-to-Character-to-Word (L2C2W) recognizer that splits lines into characters and further merges them back into words. HTS achieves state-of-the-art results on multiple word-level text spotting benchmark datasets as well as geometric layout analysis tasks. Code will be released upon acceptance. View details
    Mechanism Design for Large Language Models
    Paul Duetting
    Haifeng Xu
    Proceedings of the ACM on Web Conference 2024, Association for Computing Machinery, New York, NY, USA, 144–155
    Preview abstract We investigate auction mechanisms for AI-generated content, focusing on applications like ad creative generation. In our model, agents' preferences over stochastically generated content are encoded as large language models (LLMs). We propose an auction format that operates on a token-by-token basis, and allows LLM agents to influence content creation through single dimensional bids. We formulate two desirable incentive properties and prove their equivalence to a monotonicity condition on output aggregation. This equivalence enables a second-price rule design, even absent explicit agent valuation functions. Our design is supported by demonstrations on a publicly available LLM. View details
    Preview abstract Large language models have demonstrated remarkable capabilities, but their performance is heavily reliant on effective prompt engineering. Automatic prompt optimization (APO) methods are designed to automate this and can be broadly categorized into those targeting instructions (instruction optimization, IO) vs. those targeting exemplars (exemplar selection, ES). Despite their shared objective, these have evolved rather independently, with IO recently receiving more research attention. This paper seeks to bridge this gap by comprehensively comparing the performance of representative IO and ES techniques, both isolation and combination, on a diverse set of challenging tasks. Our findings reveal that intelligently reusing model-generated input-output pairs obtained from evaluating prompts on the validation set as exemplars consistently improves performance over IO methods but is currently under-investigated. We also find that despite the recent focus on IO, how we select exemplars can outweigh how we optimize instructions, with ES strategies as simple as random search outperforming state-of-the-art IO methods with seed instructions without any optimization. Moreover, we observe synergy between ES and IO, with optimal combinations surpassing individual contributions. We conclude that studying exemplar selection as a standalone method and its optimal combination with instruction optimization remains a crucial aspect of APO and deserves greater consideration in future research, even in the era of highly capable instruction-following models. View details