Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 11257 publications
    A Computer Vision Problem in Flatland
    Erin Connelly
    Annalisa Crannell
    Timothy Duff
    Rekha R. Thomas
    SIAM Journal on Applied Algebra and Geometry, 10 (2026), pp. 14-45
    Preview abstract When is it possible to project two sets of labeled points of equal cardinality lying in a pair of projective planes to the same image on a projective line? We give a complete answer to this question, obtaining the following results. We first show that such a pair of projections exist if and only if the two point sets are themselves images of a common point set in projective space. Moreover, we find that for generic pairs of point sets, a common projection exists if and only if their cardinality is at most seven. In these cases, we give an explicit description of the loci of projection centers that enable a common image. View details
    Usability Hasn’t Peaked: Exploring How Expressive Design Overcomes the Usability Plateau
    Alyssa Sheehan
    Bianca Gallardo
    Ying Wang
    Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI ’26), April 13–17, 2026, Barcelona, Spain (2026)
    Preview abstract Critics have argued that mobile usability has largely been optimized, and that only incremental gains are possible. We set out to explore if the newest generation of design systems, which promote greater flexibility and a return to design basics, could produce substantially more usable designs while maintaining or increasing aesthetic judgments. Through a study with 48 diverse participants completing tasks in 10 different applications, we found that in designs created following Material 3 Expressive guidelines, users fixated on the correct screen element for a task 33% faster, completed tasks 20% faster, and rated experiences more positively compared to versions designed using the previous Material design system. These improvements in performance and aesthetic ratings challenge the premise of a usability plateau and show that mobile usability has not peaked. We illustrate specific opportunities to make mobile experiences more usable by returning to design fundamentals while highlighting risks of added flexibility. View details
    Preview abstract Object-Counting for remote-sensing (RS) imagery is raising increasing research interest due to its crucial role in a wide and diverse set of applications. While several promising methods for RS object-counting have been proposed, existing methods focus on a closed, pre-defined set of object classes. This limitation necessitates costly re-annotation and model re-training to adapt current approaches for counting of novel objects that have not been seen during training, and severely inhibits their application in dynamic, real-world monitoring scenarios. To address this gap, in this work we propose RS-OVC - an adaptation of existing work for Open Vocabulary Counting (OVC) approach from general computer vision to the RS domain. We show that our model is capable of accurate counting of novel object classes, that are unseen during training, based solely on textual and/or visual conditioning. View details
    Preview abstract The rapid expansion of the Internet of Things (IoT) and smart home ecosystems has led to a fragmented landscape of user data management across consumer electronics (CE) such as Smart TVs, gaming consoles, and set-top boxes. Current onboarding processes on these devices are characterized by high friction due to manual data entry and opaque data-sharing practices. This paper introduces the User Data Sharing System (UDSS), a platform-agnostic framework designed to facilitate secure, privacy-first PII (Personally Identifiable Information) exchange between device platforms and third-party applications. Our system implements a Contextual Scope Enforcement (CSE) mechanism that programmatically restricts data exposure based on user intent—specifically distinguishing between Sign-In and Sign-Up workflows. Unlike cloud-anchored identity standards such as FIDO2/WebAuthn, UDSS is designed for shared, device-centric CE environments where persistent user-to-device bind-ing cannot be assumed. We further propose a tiered access model that balances developer needs with regulatory compliance (GDPR/CCPA). A proof-of-concept implementation on a reference ARMv8 Linux-based middleware demonstrates that UDSS reduces user onboarding latency by 65% and measurably reduces PII over-exposure risk through protocol-enforced data minimization. This framework provides a standardized approach to identity management in the heterogeneous CE market. View details
    Managing and Securing Google's Fleet of Multi-Node Servers
    Richard Hanley
    Havard Skinnemoen
    Andrés Lagar-Cavilla
    Michael Wong
    Jeff Andersen
    Kishan Prasad
    Patrick Leis
    Shiva Rao
    Chris Koch
    Jad Baydoun
    Anna Sapek
    Communications of the ACM, 69:3 (2026), pp. 82 - 92
    Preview abstract Server hardware and software co-design for a secure, efficient cloud. View details
    ALF: Advertiser Large Foundation Model for Multi-Modal Advertiser Understanding
    Sunny Rajagopalan
    Alireza Golestaneh
    Shubhra Chandra
    Min Zhou
    Jonathan Vronsky
    Songbai Yan
    2026
    Preview abstract We present ALF (Advertiser Large Foundation model), a multi-modal transformer architecture for understanding advertiser behavior and intent across text, image, video and structured data modalities. Through contrastive learning and multi-task optimization, ALF creates unified advertiser representations that capture both content and behavioral patterns. Our model achieves state-of-the-art performance on critical tasks including fraud detection, policy violation identification, and advertiser similarity matching. In production deployment, ALF reduces false positives by 90\% while maintaining 99.8\% precision on abuse detection tasks. The architecture's effectiveness stems from its novel combination of multi-modal transformations, intersample attention mechanism, spectrally normalized projections, and calibrated probabilistic outputs. View details
    Identifying Hearing Difficulty Moments in Conversational Audio
    Jack Collins
    Adrian Buzea
    Chris Collier
    Alejandro Ballesta Rosen
    Julian Maclaren
    Kelly Miles
    Simon Carlile
    Trends in Hearing (2026)
    Preview abstract Individuals regularly experience Hearing Difficulty Moments in everyday conversation. Identifying Hearing Difficulty Moments has particular significance in the field of hearing assistive technology where timely interventions are key for real-time hearing assistance. In this article, we propose and compare machine learning solutions for the temporal detection of segments containing Hearing Difficulty Moments in conversational audio. We show that audio language models, through their multimodal reasoning capabilities, can achieve state-of-the-art results for this task, significantly outperforming a simple automatic speech recognition (ASR) hotword heuristic and a more conventional fine-tuning approach with Wav2Vec, an audio-only input architecture that is state-of-the-art for ASR. View details
    Preview abstract Multimodal large language models (LLMs) integrate and process information from multiple modalities such as text, images, audio, and video, enabling complex tasks such as audio translation and visual question answering. While powerful, this complexity introduces novel vulnerabilities to sophisticated adversarial attacks. This survey paper provides a comprehensive overview of this rapidly expanding field, systematically categorizing attacks that range from manipulations of single modalities (e.g., perturbed images or audio) to those exploiting cross-modal interactions. We overview how these attacks exploit weaknesses in model fusion, attention mechanisms, and representation learning and provided analyses on their potential for real-world consequences. View details
    Preview abstract High-volume enterprise service organizations face a persistent challenge in transitioning from reactive support models to proactive, preventative ones. This paper introduces the Agentic Trend-to-Knowledge (ATK) methodology, a novel, autonomous framework designed to address this gap. The ATK methodology employs an AI agent that operates in a recurring, closed loop. It first uses a two-stage process for the autonomous thematic analysis of recent support cases to identify the most significant recurring issue. It then leverages Retrieval-Augmented Generation (RAG) to source relevant institutional knowledge. A key innovation is the agent's adaptive, bimodal response: if relevant knowledge is found, it drafts a proactive communication for human review; if a knowledge gap is detected, it autonomously creates a content creation task for the appropriate team. This transforms the agent from an automation tool into a proactive process owner that creates a virtuous cycle of continuous improvement for both case deflection and knowledge base quality. By automating the entire workflow from insight to action, the ATK framework provides a concrete methodology for shifting from a "human-in-the-loop" to a more strategic "human-on-the-loop" operational paradigm. View details
    Preview abstract Some artificial intelligence provisioning models that function as tools for human users or rely on labor arbitrage can present challenges for organizations, such as managing personnel rather than task outcomes and introducing data security risks. An architecture is described for an outcome-based synthetic labor market in which autonomous computational agents can be compensated based on verified task completion. The framework can leverage trusted execution environments to create secure hardware enclaves for processing sensitive data, which can render the data cryptographically inaccessible to a host system or agent provider. This approach can facilitate a secure, transactional market for autonomous professional execution, which may enable a shift from managing labor resources to procuring verified outcomes from a pool of specialized agents. View details
    Preview abstract As artificial intelligence (AI) is rapidly integrated into healthcare, ensuring that this innovation helps to combat health inequities requires engaging marginalized communities in health AI futuring. However, little research has examined Black populations’ perspectives on the use of AI in health contexts, despite the widespread health inequities they experience–inequities that are already perpetuated by AI. Addressing this research gap, through qualitative workshops with 18 Black adults, we characterize participants’ cautious optimism for health AI addressing structural well-being barriers (e.g., by providing second opinions that introduce fairness into an unjust healthcare system), and their concerns that AI will worsen health inequities (e.g., through health AI biases they deemed inevitable and the problematic reality of having to trust healthcare providers to use AI equitably). We advance health AI research by articulating previously-unreported health AI perspectives from a population experiencing significant health inequities, and presenting key considerations for future work. View details
    Preview abstract Source-to-source compilers may perform inefficiently by executing transpilation passes on scripts that do not contain the specific language features a pass is designed to transform, potentially leading to redundant processing. A compiler can analyze a script to generate a per-script feature map, for example, by identifying language features in its abstract syntax tree (AST). Before executing a transpilation pass, the compiler can check this map and may bypass the pass for that script if the specific feature targeted by the pass is not present. This feature map can also be dynamically updated throughout the compilation process as other passes transform the code. This method of conditional pass execution based on content-aware analysis may reduce redundant AST traversals, which could decrease overall compilation time and computational resource consumption. View details
    Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies
    Han Zhou
    Shariq Iqbal
    Ivan Vulić
    Anna Korhonen
    International Conference on Learning Representations (ICLR) (2026)
    Preview abstract Large language models (LLMs), employed as multiple agents that interact and collaborate with each other, have excelled at solving complex tasks. The agents are programmed with {prompts} that declare their functionality, along with the {workflows} that orchestrate interactions within a structured flow. Designing prompts and workflows for multi-agent systems is inherently complex, especially when addressing a new task. It often demands expert-level knowledge and involves significant trial and error. Gaining a deep understanding of the factors that contribute to effective multi-agent systems is essential for automating the entire process. Motivated by this, we first conduct an in-depth analysis of the design spaces for multi-agent systems, focusing on the impact of prompts, scaling the number of agents, and common types of agentic modules. Our findings reveal that top-performing systems often emerge from simpler design spaces, where prompts play a critical role in enhancing agent functionality and enabling more effective scaling. Based on the insights, we propose Multi-Agent System Search (MASS), a multi-stage optimization framework that performs the optimization in a pruned design space, with prompts and an influential subset of modules. We show that MASS-optimized multi-agent systems outperform existing alterntives by a substantial margin. Based on the MASS-found systems, we finally propose design principles behind building effective multi-agent systems. View details
    Preview abstract When managing complex, unpredictable (non-deterministic) AI agents using simple, fixed control systems (like finite state machines), operational failures and accountability issues often arise. This document introduces a probabilistic governance and telemetry framework to resolve these problems. Instead of following a rigid sequence of steps, this framework defines a multi-dimensional operational boundary, a 'behavioral volume', and assigns the agent a goal. This allows the agent to use its own reasoning to achieve the goal while remaining within the defined boundaries. A separate telemetry layer monitors the agent's actions by calculating metrics, such as alignment scores and drift velocity, to measure how much the agent deviates from its intended behavior. This system provides a method for guiding, monitoring, and securing autonomous agents, effectively managing the performance and security of an unpredictable AI workforce in complex environments. View details
    SNPeek: Side-Channel Analysis for Privacy Applications on Confidential VMs
    Ruiyi Zhang
    Albert Cheu
    Adria Gascon
    Michael Schwarz
    Octavian Suciu
    Network and Distributed System Security (NDSS) (2026)
    Preview abstract Confidential virtual machines (CVMs) based on trusted execution environments (TEEs) enable new privacy-preserving solutions. But CVMs are not a privacy panacea, as they are vulnerable to side-channel attacks that may compromise confidentially of workloads. In this work, we develop the FARFETCH’D framework to help developers evaluate side-channel assisted privacy attacks that are broadly applicable to CVMs. The privacy reduction due to these attacks heavily depend on the execution environment and the workload, which varies vastly:What are avail-able attack primitives? How does the particular privacy work-load behave?This makes manual investigation and efficiently mitigating software-based side channels a cumbersome and impossible task. FARFETCH’D solves this challenge by providing a set of configurable attack primitives that can execute on real CVM hardware and automated ML-based analysis pipelines. We evaluate the effectiveness of FARFETCH’D on privacy-preserving workloads. Our results show that our approach is effective at pinpointing the vulnerability of privacy apps against side channels and help evaluating mitigation based on oblivious memory and differential privacy. View details

    Follow us

    ×