Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 11223 publications
Gaze Target Estimation Anywhere with Concepts
Xu Cao
Houze Yang
Vipin Gunda
Inki Kim
Jim Rehg
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2026)
Preview abstract
Estimating human gaze targets in-the-wild is a formidable challenge. Existing computer vision algorithms rely on brittle, multi-stage pipelines that require explicit inputs like head bounding boxes and human pose, causing initial detection errors to cascade and lead to system failure. To overcome this, we introduce the \textbf{Promptable Gaze Target Estimation (PGE)} task, a new end-to-end, concept-driven paradigm. PGE conditions gaze prediction on flexible user text or visual prompts (e.g., "the boy in the red shirt" or "person in point [0.52, 0.48]") to identify a specific subject's target, which eliminates the rigid dependency on intermediate localization cues. We develop a scalable data engine to generate \textbf{Gaze-Co}, a dataset and benchmark of 120K high-quality, prompt-annotated image pairs. We also propose \textbf{AnyGaze}, the first model designed for PGE. AnyGaze uses a Transformer-based detector to fuse features from frozen encoders and simultaneously solves subject localization, in/out-of-frame presence, and gaze target heatmap estimation. AnyGaze achieves state-of-the-art performance on standard gaze target estimation benchmarks, setting a strong baseline for this new problem even on a difficult out-of-domain, real-world clinical dataset. We will open-source the AnyGaze model and the Gaze-Co benchmark.
View details
"It didn’t feel right but I needed a job so desperately": Understanding People's Emotions & Help Needs During Financial Scams
G. Jake Chanenson
Tara Matthews
Jessica McClearn
Sarah Meiklejohn
Mia Hassoun
2026
Preview abstract
Online financial scams represent a long-standing and serious threat for which people seek help. We present a study to understand people’s in situ motivations for engaging with scams and the help needs they express before, during, and after encountering a scam. We identify the main emotions scammers exploited (e.g., fear, hope) and characterize how they did so. We examine factors—such as financial insecurity and legal precarity—which elevate people’s risk of engaging with specific scams and experiencing harm. We indicate when people sought help and describe their help-seeking needs and emotions at different stages of the scam. We discuss how these needs could be met through the design of contextually-specific prevention, diagnostic, mitigation, and recovery interventions.
View details
Preview abstract
Multimodal large language models (LLMs) integrate and process information from multiple modalities such as text, images, audio, and video, enabling complex tasks such as audio translation and visual question answering. While powerful, this complexity introduces novel vulnerabilities to sophisticated adversarial attacks. This survey paper provides a comprehensive overview of this rapidly expanding field, systematically categorizing attacks that range from manipulations of single modalities (e.g., perturbed images or audio) to those exploiting cross-modal interactions. We overview how these attacks exploit weaknesses in model fusion, attention mechanisms, and representation learning and provided analyses on their potential for real-world consequences.
View details
Usability Hasn’t Peaked: Exploring How Expressive Design Overcomes the Usability Plateau
Alyssa Sheehan
Bianca Gallardo
Ying Wang
Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI ’26), April 13–17, 2026, Barcelona, Spain (2026)
Preview abstract
Critics have argued that mobile usability has largely been optimized, and that only incremental gains are possible. We set out to explore if the newest generation of design systems, which promote greater flexibility and a return to design basics, could produce substantially more usable designs while maintaining or increasing aesthetic judgments. Through a study with 48 diverse participants completing tasks in 10 different applications, we found that in designs created following Material 3 Expressive guidelines, users fixated on the correct screen element for a task 33% faster, completed tasks 20% faster, and rated experiences more positively compared to versions designed using the previous Material design system. These improvements in performance and aesthetic ratings challenge the premise of a usability plateau and show that mobile usability has not peaked. We illustrate specific opportunities to make mobile experiences more usable by returning to design fundamentals while highlighting risks of added flexibility.
View details
Preview abstract
For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step.
View details
Improving Low-Vision Chart Accessibility via On-Cursor Visual Context
Yotam Sechayk
Hennes Rave
Max Radler
Mark Colley
Ariel Shamir
Takeo Igarashi
Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI 26)
Preview abstract
Despite widespread use, charts remain largely inaccessible for Low-Vision Individuals (LVI). Reading charts requires viewing data points within a global context, which is difficult for LVI who may rely on magnification or experience a partial field of vision. We aim to improve exploration by providing visual access to critical context. To inform this, we conducted a formative study with five LVI. We identified four fundamental contextual elements common across chart types: axes, legend, grid lines, and the overview. We propose two pointer-based interaction methods to provide this context: Dynamic Context, a novel focus+context interaction, and Mini-map, which adapts overview+detail principles for LVI. In a study with N=22 LVI, we compared both methods and evaluated their integration to current tools. Our results show that Dynamic Context had significant positive impact on access, usability, and effort reduction; however, worsened visual load. Mini-map strengthened spatial understanding, but was less preferred for this task. We offer design insights to guide the development of future systems that support LVI with visual context while balancing visual load.
View details
On-the-Fly OVD Adaptation with FLAME: Few-shot Localization via Active Marginal-Samples Exploration
Yehonathan Refael
Amit Aides
Aviad Barzilai
Vered Silverman
Bolous Jaber
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops (2026), pp. 886-894
Preview abstract
Open-vocabulary object detection (OVD) models offer remarkable flexibility applications by enabling object detection from arbitrary text queries. Still, the zero-shot performance of the pre-trained models is hampered by the inherent semantic ambiguity of natural language, result to low precision, leading to insufficient crucial downstream applications. For instance, in the remote sensing (RS) domain, a query for "ship" can yield varied and contextually irrelevant results. To address this, for real time applications, we propose a novel cascaded architecture that synergizes the broad capabilities of a large, pre-trained OVD model with a lightweight, few-shot classifier. Our approach utilizes the frozen weights of the zero-shot model to generate initial, high-recall object-embedding proposals, which are then refined by a compact classifier trained in real-time on a handful of user-annotated examples. The core of our contribution is an efficient one step active learning strategy for selecting the most informative samples for user annotation. Our method identifies (extremely) small amount of an uncertain candidates near the theoretical decision boundary using density estimation and then applies clustering to ensure a diverse training set. This targeted sampling enables our cascaded system to elevate performance on standard remote sensing benchmarks. Our work thus presents a practical and resource-efficient framework for adapting foundational models to specific user needs, drastically reducing annotation overhead while achieving high accuracy without costly full-model fine-tuning.
View details
A Computer Vision Problem in Flatland
Erin Connelly
Annalisa Crannell
Timothy Duff
Rekha R. Thomas
SIAM Journal on Applied Algebra and Geometry, 10 (2026), pp. 14-45
Preview abstract
When is it possible to project two sets of labeled points of equal cardinality lying in a pair of projective planes to the same image on a projective line? We give a complete answer to this question, obtaining the following results. We first show that such a pair of projections exist if and only if the two point sets are themselves images of a common point set in projective space. Moreover, we find that for generic pairs of point sets, a common projection exists if and only if their cardinality is at most seven. In these cases, we give an explicit description of the loci of projection centers that enable a common image.
View details
MoXaRt: Audio-Visual Object-Guided Sound Interaction for XR
Sieun Kim
Qianhui Zheng
Ruoyu Xu
Ravi Tejasvi
Anuva Kulkarni
Junyi Zhu
2026
Preview abstract
In Extended Reality (XR), complex acoustic environments often overwhelm users, compromising both scene awareness and social engagement due to entangled sound sources. We introduce MoXaRt, a real-time XR system that uses audio-visual cues to separate these sources and enable fine-grained sound interaction. MoXaRt's core is a cascaded architecture that performs coarse, audio-only separation in parallel with visual detection of sources (e.g. faces, instruments). These visual anchors then guide refinement networks to isolate individual sources, separating complex mixes of up to five concurrent sources (e.g. two voices + three instruments) with ca. 2 second processing latency. We validate MoXaRt through a technical evaluation on a new, complex dataset we collected, and a 22-participant user study. Our results demonstrate that MoXaRt significantly improves communication clarity—boosting listening comprehension in noisy conditions by 33.2% (p=0.0058)—and significantly reduces cognitive load (M=7.50 vs. M=3.36, p<0.001), paving the way for more perceptive and socially adept XR experiences.
View details
Reasoning-Driven Synthetic Data Generation and Evaluation
Tim R. Davidson
Benoit Seguin
Transactions on Machine Learning Research (2026)
Preview abstract
Although many AI applications of interest require specialized multi-modal models, relevant data to train such models is inherently scarce or inaccessible. Filling these gaps with human annotators is prohibitively expensive, error-prone, and time-consuming, leading model builders to increasingly consider synthetic data as a scalable alternative. However, existing synthetic data generation methods often rely on manual prompts, evolutionary algorithms, or extensive seed data from the target distribution — limiting their scalability, explainability, and control. In this paper, we introduce Simula: a novel reasoning-driven framework for data generation and evaluation. It employs a seedless, agentic approach to generate synthetic datasets at scale, allowing users to define desired dataset characteristics through an explainable and controllable process that enables fine-grained resource allocation. We show the efficacy of our approach on a variety of datasets, rigorously testing both intrinsic and downstream properties. Our work (1) offers guidelines for synthetic data mechanism design, (2) provides insights into generating and evaluating synthetic data at scale, and (3) unlocks new opportunities for developing and deploying AI in domains where data scarcity or privacy concerns are paramount.
View details
Preview abstract
Validating conversational artificial intelligence (AI) for regulated medical software applications may present challenges, as static test datasets and manual review may be limited in identifying emergent, conversational anomalies. A multi-agent AI system may be configured in a closed-loop for automated validation. The system can, for example, utilize an end user persona simulator agent to generate prompts for a target model and a domain /regulatory expert adjudicator agent to evaluate the target model’s responses against a configurable rubric. A meta-analysis agent can analyze anomalies to identify underlying vulnerabilities, which may then be used to programmatically synthesize new adversarial personas. This adaptive process can generate evidence to support regulatory compliance and continuous performance monitoring for medical software algorithms systems.
View details
KVCIS: Activation-Based Token Importance Prediction for Intelligent KV-Cache Compression
Zenodo (2026)
Preview abstract
We introduce KVCIS (KV-Cache Importance Scoring), a novel approach to KV-cache compression that predicts token importance from intermediate-layer activations before attention is computed. Unlike existing methods (H2O, StreamingLLM, Scissorhands) that make compression decisions based on attention scores computed during generation, KVCIS enables proactive compression at cache insertion time—determining how to store each token before paying the computational cost of attention. We discover a two-level importance structure in decoder-only transformers: the beginning-of-sequence (BOS) token acts as an "attention sink" receiving ~76% of attention, while the remaining ~24% is distributed across content tokens with 10-11× importance spread. A simple linear probe achieves R² = 0.998 overall and R² = 0.68–0.79 for discriminating among content tokens. Extensive validation across 3 model families (Llama, Mistral, Gemma), 8 layer depths, context lengths from 256 to 2048 tokens, and multiple downstream tasks demonstrates: 50% memory reduction with zero degradation on NarrativeQA (F1 = 0.064 matching baseline exactly), while uniform quantization degrades by 7.8% at the same compression ratio. KVCIS consistently achieves 5–8× better quality preservation than uniform quantization across all tested context lengths. The memory savings enable increased batch sizes and longer context support; the probe itself adds minimal overhead (~16KB direction vector, 0.06ms per token). This work extends activation-based probing from safety classification to inference optimization, demonstrating that intermediate-layer activations encode predictive signals about token importance for generation.
View details
Preview abstract
In some multi-stage software build pipelines, downstream compiler errors may be reported against ephemeral, machine-generated intermediate artifacts rather than original, human-written source code, which can make remediation challenging. A system and method may address this by intercepting a downstream error, mapping its location back to the original source file, and programmatically injecting a dormant suppression tag into the original source code. During a subsequent build, an intermediate transpiler can propagate this tag into a newly generated intermediate artifact. In the intermediate file, the tag may become active and be recognized by the downstream compiler as a directive to suppress the specific error. This approach can facilitate an automated remediation process for certain build failures that avoids direct modification of ephemeral files and uses the original source code as a record for suppression.
View details
Securing Elliptic Curve Cryptocurrencies against Quantum Vulnerabilities: Resource Estimates and Mitigations
Michael Broughton
Thiago Bergamaschi
Justin Drake
Dan Boneh
arXiv:2603.28846 (2026)
Preview abstract
This whitepaper seeks to elucidate implications that the capabilities of developing quantum architectures have on blockchain vulnerabilities and mitigation strategies. First, we provide new resource estimates for breaking the 256-bit Elliptic Curve Discrete Logarithm Problem, the core of modern blockchain cryptography. We demonstrate that Shor's algorithm for this problem can execute with either <1200 logical qubits and <90 million Toffoli gates or <1450 logical qubits and <70 million Toffoli gates. In the interest of responsible disclosure, we use a zero-knowledge proof to validate these results without disclosing attack vectors. On superconducting architectures with 1e-3 physical error rates and planar connectivity, those circuits can execute in minutes using fewer than half a million physical qubits. We introduce a critical distinction between fast-clock (such as superconducting and photonic) and slow-clock (such as neutral atom and ion trap) architectures. Our analysis reveals that the first fast-clock CRQCs would enable on-spend attacks on public mempool transactions of some cryptocurrencies. We survey major cryptocurrency vulnerabilities through this lens, identifying systemic risks associated with advanced features in some blockchains such as smart contracts, Proof-of-Stake consensus, and Data Availability Sampling, as well as the enduring concern of abandoned assets. We argue that technical solutions would benefit from accompanying public policy and discuss various frameworks of digital salvage to regulate the recovery or destruction of dormant assets while preventing adversarial seizure. We also discuss implications for other digital assets and tokenization as well as challenges and successful examples of the ongoing transition to Post-Quantum Cryptography (PQC). Finally, we urge all vulnerable cryptocurrency communities to join the ongoing migration to PQC without delay.
View details
Preview abstract
Generative AI (GenAI) is evolving from standalone tools to interconnected ecosystems that integrate chatbots, cloud platforms, and
third-party services. While this ecosystem model enables personalization and extended services, it also introduces complex information flows and amplifies privacy risks. Existing solutions focus on
system-level protections, offering little support for users to make
meaningful privacy choices. To address this gap, we conducted two
vignette-based survey studies with 486 participants and a followup interview study with 16 participants. We also explored users’
needs and preferences for privacy choice design across both GenAI
personalization and data-sharing. Our results reveal paradoxical
patterns: participants sometimes trusted third-party ecosystems
more for personalization but perceived greater control in first-party
ecosystems when data was shared externally. We discuss design implications for privacy choice interfaces that enhance transparency,
control, and trust in GenAI ecosystems.
View details