Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 11342 publications
Preview abstract While non-verbal behaviors and expressive movements are essential for natural human-robot interaction, existing methods often overlook a crucial element: the human’s internal cognitive state. Consequently, proactive multi-agent systems frequently interrupt humans at inopportune moments, leading to cognitive overload and decreased task performance. This paper introduces a framework for generating “cognitively aligned” multi-agent interactions, enhancing the ability of robotic systems to contextually defer communications during moments of high human mental workload. We present the design and implementation of a closed-loop architecture that explores the interplay between autonomous task execution and real-time neurophysiological focus. Utilizing a consumer-grade Brain-Computer Interface (BCI), our approach continuously monitors Electroencephalography (EEG) spectral band powers while a human performs a cognitive-load-inducing task. We propose a workload-driven pipeline where an HTTP-based signaling mechanism places a primary agent’s sensory inputs and audio outputs into a holding state upon detecting high cognitive load. This allows secondary agents to seamlessly process complex, delegated tasks in the background. Once the human’s cognitive state returns to a baseline, the primary agent releases the queued agent message. Our preliminary results demonstrate the feasibility of leveraging real-time signal processing, Large Language Models (LLMs), and physical robotic embodiments to create interrupt-aware, non-intrusive multi-agent systems. View details
Who Controls the Curriculum for AI? The Limits of Participatory Design for Educational AI
Michael Madaio
Learning Under Algorithmic Conditions, University of Minnesota Press (2026)
Preview abstract Participatory design is a long-standing effort to shift control over technology design from technologists to users and communities impacted by technologies. For educational AI, this means involving students, families, teachers, and other stakeholders in shaping the design of AI systems. While promising, in this article, I situate the recent calls for participatory design of educational AI systems within a different historical tradition—that of contests over local control of educational curricula. I argue that approaches that attempt to steer the design and development of educational AI through participatory methods may inadvertently reproduce the history of political contestation of educational curricula, in ways that may privilege the most powerful communities, rather than those inequitably impacted. What might it look like to treat participatory AI design as a site for political contestation? How might these approaches avoid reproducing the same majoritarian tendencies that led to educational inequities in the first place? View details
Unveiling the Global Landscape of Android Security Updates
Haiyun Deng
Abbas Acar
Esteban Luques
Harun Oz
Ahmet Aris
Selcuk Uluagac
IEEE Transactions on Dependable and Secure Computing (2026)
Preview abstract Android is the world’s leading mobile operating system, with over three billion active devices. Detecting vulnerabilities and ensuring timely patch deployment are critical to maintaining security. The Android Open Source Project (AOSP) has enhanced the transparency of security updates through Security Patch Levels. However, challenges related to update speed and availability persist. In 2022, Google reported that half of the zero-day vulnerabilities discovered in the wild were variations of vulnerabilities that had already been patched. Recent research mainly highlights delays in update distribution, often attributing them to fragmentation and focusing primarily on flagship devices or limited time-frames. Our approach takes a device-centric perspective to investigate Android update patterns, analyzing 567K security update records from 2014 to 2024, covering 904 distinct devices from six key Original Equipment Manufacturers (OEMs) across 98 countries. Our extensive analysis revealed notable differences in update release timing across OEMs, device types, and regions. Our study also examines documented vulnerabilities and weaknesses, while assessing OEM compliance with Android security guidelines. Our study shows that ∼89.7% of vulnerabilities on unpatched Android devices are exploitable without user interaction and with low attack complexity. We also identified delays linked to fragmentation and OEM-specific challenges, and provide actionable insights for improvement. View details
Progressive Photorealistic Simplification
Adi Rosenthal
Yedid Hoshen
Arik Shamir
2026
Preview abstract Existing image simplification techniques often rely on Non-Photorealistic Rendering (NPR), transforming photographs into stylized sketches, cartoons, or paintings. While effective at reducing visual complexity, such approaches typically sacrifice photographic realism. In this work, we explore a complementary direction: simplifying images while preserving their photorealistic appearance. We introduce progressive semantic image simplification, a framework that iteratively reduces scene complexity by removing and inpainting elements in a controlled manner. At each step, the resulting image remains a plausible natural photograph. Our method combines semantic understanding with generative editing, leveraging Vision-Language Models (VLMs) to identify and prioritize elements for removal, and a learned verifier to ensure photorealism and coherence throughout the process. This is implemented via an iterative \emph{Select–Remove–Verify} pipeline that produces high-quality simplification trajectories. To improve efficiency, we further distill this process into an image-to-video generation model that directly predicts coherent simplification sequences from a single input image. Beyond generating cleaner and more focused compositions, our approach enables applications such as content-aware decluttering, semantic layer decomposition, and interactive editing. More broadly, our work suggests that simplification through structured content removal can serve as a practical mechanism for guiding visual interpretation within the photorealistic domain, complementing traditional abstraction methods. View details
Identifying Hearing Difficulty Moments in Conversational Audio
Jack Collins
Adrian Buzea
Chris Collier
Alejandro Ballesta Rosen
Julian Maclaren
Kelly Miles
Simon Carlile
Trends in Hearing (2026)
Preview abstract Individuals regularly experience Hearing Difficulty Moments in everyday conversation. Identifying Hearing Difficulty Moments has particular significance in the field of hearing assistive technology where timely interventions are key for real-time hearing assistance. In this article, we propose and compare machine learning solutions for the temporal detection of segments containing Hearing Difficulty Moments in conversational audio. We show that audio language models, through their multimodal reasoning capabilities, can achieve state-of-the-art results for this task, significantly outperforming a simple automatic speech recognition (ASR) hotword heuristic and a more conventional fine-tuning approach with Wav2Vec, an audio-only input architecture that is state-of-the-art for ASR. View details
LLM-Powered Analysis of IoT User Reviews: Tracking and Ranking Security and Privacy Concerns
Taufiq Islam Protick
Anupam Das
Proceedings of the International AAAI Conference on Web and Social Media (ICWSM) (2026)
Preview abstract Being able to understand the security and privacy (S&P) concerns of IoT users brings benefits to both developers and users. To learn about users' views, we examine Amazon IoT reviews - one of the biggest IoT markets. This work presents a state-of-the-art methodology to identify and categorize reviews in which users express S&P concerns. We developed an automated pipeline by fine-tuning GPT-3.5-Turbo to build two models: the Classifier-Rationalizer-Categorizer and the Thematic Mapper. By leveraging dynamic few-shot prompting and the model's large context size, our pipeline achieved over 97% precision and recall, significantly outperforming keyword-based and classical ML methods. We applied our pipeline to 91K Amazon reviews about fitness trackers, smart speakers and cameras, over multiple years. We found that on average 5% contained S&P concerns, while security camera exhibited the highest prevalence at 10%. Our method detected significantly more S&P-relevant reviews than prior works: 15x more for fitness trackers, 29% more for smart speakers, and 70% more for cameras. Our longitudinal analysis reveals that concerns like surveillance and data control have persisted for years, suggesting limited industry progress. We demonstrate that across all device types, users consistently demand more precise control over what data is collected and shared. We uncover challenges in multi-user and multi-device interactions, identifying two previously unreported themes concerning inadequate controls for account separation and data access. These findings, ranging from broad persistent trends to specific instances of customer loss, offer actionable insights for developers to improve user satisfaction and trust. View details
Preview abstract The emergence of Agentic AI—autonomous systems capable of reasoning, decision-making, and multi-step execution—represents a paradigm shift in enterprise technology. Moving beyond simple generative tasks, these agents offer the potential to solve long-standing industry pain points, with over 90% of enterprises planning integration within the next three years. However, the transition from successful proof-of-concept (PoC) to a resilient, production-grade system presents significant hurdles. This article categorizes these challenges into three primary domains: Technical and Engineering Hurdles: Issues such as "entangled workflows" that complicate debugging, the struggle to maintain output quality and mitigate hallucinations, and the unpredictability caused by shifting underlying models or data sources. People, Process, and Ecosystem Hurdles: The high operational costs and unclear ROI of large models, the necessity of a new "Agent Ops" skillset, the complexity of integrating agents with disparate enterprise systems, and a rapidly evolving regulatory landscape. The Pace of Change and Security risks: The technical debt incurred by shifting software frameworks and the expanded attack surface created by autonomous agents. The article concludes that successful deployment requires a shift from informal "vibe-testing" to rigorous engineering discipline. By adopting code-first frameworks, establishing robust evaluation metrics (KPIs), and prioritizing functional deployment over theoretical optimization, organizations can effectively manage the lifecycle of Agentic AI and realize its transformative business value. View details
Preview abstract This paper introduces Operationalized Temporal Entity Resolution, a distributed system architecture designed to resolve data consistency challenges in modern Security Information and Event Management (SIEM) environments. processing petabytes of high-velocity telemetry. We address the critical failure mode of ”State Smearing”—a temporal discrepancy between an entity’s state at event time versus analysis time—which frequently corrupts forensic timelines, particularly regarding ephemeral assets like containers and DHCP leases. Our approach coalesces heterogeneous data from diverse log sources into a single, canonical representation, processing over 2 billion entity fragments daily. By leveraging a deterministic Dynamic Graph Resolution via modified distributed connected components and a novel Density-Aware Temporal Checkpointing algorithm, we generate precise validity intervals. This method embeds temporal state directly into the resolution graph, eliminating the need for computationally expensive query-time joins. Ultimately, this architecture enables security analysts to perform ”time-travel” queries to reconstruct historical states accurately. Analysis of a production environment demonstrates that 8–16% of threat detection rules critically depend on this enriched temporal merging. View details
Fair Allocation of Indivisible Goods with Variable Groups
Paul Golz
Warut Suksompong
Ayumi Igarashi
AAAI (2026)
Preview abstract We study the fair allocation of indivisible goods with variable groups. In this model, the goal is to partition the agents into groups of given sizes and allocate the goods to the groups in a fair manner. We show that for any number of groups and corresponding sizes, there always exists an envy-free up to one good (EF1) outcome, thereby generalizing an important result from the individual setting. Our result holds for arbitrary monotonic utilities and comes with an efficient algorithm. We also prove that the EF1 existence can be guaranteed even when the goods lie on a path and each group must receive a connected bundle. In addition, we consider a probabilistic model where the utilities are additive and drawn randomly from a distribution. We show that if there are n agents and the number of goods m is divisible by the number of groups k, then an envy-free outcome exists with high probability if m = ω(log n), and this bound is tight. On the other hand, if m is not divisible by k, then an envy-free outcome is unlikely to exist as long as m = o(√n). View details
VISTA: A Test-Time Self-Improving Video Generation Agent
Xuan Long Do
Hootan Nakhost
The IEEE/CVF Conference on Computer Vision and Pattern Recognition (to appear) (2026)
Preview abstract Despite rapid advances in text-to-video (T2V) synthesis, generated video quality remains critically dependent on precise user prompts. Existing test-time optimization methods, successful in other domains, struggle with the multi-faceted nature of video. To address this, we introduce VISTA, a novel multi-agent system that autonomously refines prompts to improve video generation. VISTA operates in an iterative loop, first decomposing a user's idea into a structured temporal plan. After generation, the best video is identified through a robust pairwise tournament. This winning video is then critiqued by a trio of specialized agents focusing on visual, audio, and contextual fidelity. Finally, a reasoning agent synthesizes this feedback to introspectively rewrite and enhance the prompt for the next generation cycle. To rigorously evaluate our proposed approach, we introduce MovieGen-Bench, a new benchmark of diverse single- and multi-scene video generation tasks. Experiments show that while prior methods yield inconsistent gains, VISTA consistently improves video quality, achieving up to 60% pairwise win rate against state-of-the-art baselines. Human evaluators concur, preferring VISTA's outputs in 68% of comparisons. View details
Reasoning-Driven Synthetic Data Generation and Evaluation
Tim R. Davidson
Benoit Seguin
Transactions on Machine Learning Research (2026)
Preview abstract Although many AI applications of interest require specialized multi-modal models, relevant data to train such models is inherently scarce or inaccessible. Filling these gaps with human annotators is prohibitively expensive, error-prone, and time-consuming, leading model builders to increasingly consider synthetic data as a scalable alternative. However, existing synthetic data generation methods often rely on manual prompts, evolutionary algorithms, or extensive seed data from the target distribution — limiting their scalability, explainability, and control. In this paper, we introduce Simula: a novel reasoning-driven framework for data generation and evaluation. It employs a seedless, agentic approach to generate synthetic datasets at scale, allowing users to define desired dataset characteristics through an explainable and controllable process that enables fine-grained resource allocation. We show the efficacy of our approach on a variety of datasets, rigorously testing both intrinsic and downstream properties. Our work (1) offers guidelines for synthetic data mechanism design, (2) provides insights into generating and evaluating synthetic data at scale, and (3) unlocks new opportunities for developing and deploying AI in domains where data scarcity or privacy concerns are paramount. View details
Productionizing Quantum Mass Production
Bill Huggins
Nathan Wiebe
arXiv for now (2026) (to appear)
Preview abstract For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step. View details
Preview abstract PURPOSE: To introduce Cardio Load (CL), a metric quantifying cardiovascular work from all activities across the day, and to investigate its distribution by age, gender, and workout profiles. CL adapts the Training Impulse (TRIMP) model by leveraging continuous heart rate and movement data from wearables, enabling minute-level intensity estimation. We also discuss the derivation of weekly target loads, intended to guide fitness maintenance. METHODS: A retrospective analysis was conducted on 31.2 million hours of wrist-worn wearable data collected over a six-week period. The dataset comprised a 40,000-subject subset (37.9% female) of consenting Google Pixel Watch® users in the United States, aged 18 to 80 years (18-39: 41.8%, 40-59: 43.5%, 60+: 14.6%). Measured data included minute-interval heart rate averages, resting and maximum heart rates, minute-interval averaged accelerometer log energy, and manually-logged or auto-detected activity types. Cardio Load scores and target loads were calculated daily for each subject and compared across age and gender. We also compared the proportions of CL gained during workouts and incidental daily activities for these groups. RESULTS: Overall, the study population's mean ± SD weekly CL scores were 221 ± 156 (female) and 259 ± 169 (male). Median weekly Cardio Load (CL) values exhibited consistency for individuals between 30 and 75 years of age. When analyzed in five-year age groups, the coefficient of variation (CV%) of median weekly CL values within this age range was less than 4.5%, with younger and older subjects demonstrating higher and lower median CL, respectively. The median proportion of CL accumulated during structured workouts versus incidental daily activity was 41.0% (female) and 49.0% (male) for all subjects, though this varied considerably with average weekly workout duration. CV% of weekly target load and daily target load over 6 weeks was 23.6% and 35.2% respectively. CONCLUSION: Cardio Load provides a continuous quantification of activity load from wearables, acknowledging both structured workouts and everydayincidental activity. CL is equitably rewarded for age ranges spanning 30-75 years. Weekly target loads were found to have little measurement variability and be more consistent and, consequently, more practical for planning training and physical activity than daily targets. View details
On-the-Fly OVD Adaptation with FLAME: Few-shot Localization via Active Marginal-Samples Exploration
Yehonathan Refael
Amit Aides
Aviad Barzilai
Vered Silverman
Bolous Jaber
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops (2026), pp. 886-894
Preview abstract Open-vocabulary object detection (OVD) models offer remarkable flexibility applications by enabling object detection from arbitrary text queries. Still, the zero-shot performance of the pre-trained models is hampered by the inherent semantic ambiguity of natural language, result to low precision, leading to insufficient crucial downstream applications. For instance, in the remote sensing (RS) domain, a query for "ship" can yield varied and contextually irrelevant results. To address this, for real time applications, we propose a novel cascaded architecture that synergizes the broad capabilities of a large, pre-trained OVD model with a lightweight, few-shot classifier. Our approach utilizes the frozen weights of the zero-shot model to generate initial, high-recall object-embedding proposals, which are then refined by a compact classifier trained in real-time on a handful of user-annotated examples. The core of our contribution is an efficient one step active learning strategy for selecting the most informative samples for user annotation. Our method identifies (extremely) small amount of an uncertain candidates near the theoretical decision boundary using density estimation and then applies clustering to ensure a diverse training set. This targeted sampling enables our cascaded system to elevate performance on standard remote sensing benchmarks. Our work thus presents a practical and resource-efficient framework for adapting foundational models to specific user needs, drastically reducing annotation overhead while achieving high accuracy without costly full-model fine-tuning. View details
×