Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 11321 publications
    CrossCheck: Input Validation for WAN Control Systems
    Rishabh Iyer
    Isaac Keslassy
    Sylvia Ratnasamy
    Networked Systems Design and Implementation (NSDI) (2026) (to appear)
    Preview abstract We present CrossCheck, a system that validates inputs to the Software-Defined Networking (SDN) controller in a Wide Area Network (WAN). By detecting incorrect inputs—often stemming from bugs in the SDN control infrastructure—CrossCheck alerts operators before they trigger network outages. Our analysis at a large-scale WAN operator identifies invalid inputs as a leading cause of major outages, and we show how CrossCheck would have prevented those incidents. We deployed CrossCheck as a shadow validation system for four weeks in a production WAN, during which it accurately detected the single incident of invalid inputs that occurred while sustaining a 0% false positive rate under normal operation, hence imposing little additional burden on operators. In addition, we show through simulation that CrossCheck reliably detects a wide range of invalid inputs (e.g., detecting demand perturbations as small as 5% with 100% accuracy) and maintains a near-zero false positive rate for realistic levels of noisy, missing, or buggy telemetry data (e.g., sustaining zero false positives with up to 30% of corrupted telemetry data). View details
    Preview abstract Artificial intelligence is rapidly evolving, marked by the emergence of Large Language Model (LLM) agents – systems capable of complex reasoning, planning, and interaction with digital and physical environments. These agents, powered by advancements in LLMs, demonstrate remarkable capabilities across diverse domains, including finance, healthcare, web navigation, software development, and daily task assistance. Unlike traditional AI systems, LLM agents can perceive their surroundings, formulate multi-step plans, utilize external tools and APIs, access memory or knowledge bases, and execute actions to achieve specified goals. This ability to act upon the world, however, introduces significant safety and security challenges. The safety paradigms developed for traditional LLMs, primarily focused on mitigating harmful textual outputs (e.g., toxicity, bias), are insufficient for safeguarding LLM agents. Agents interacting with dynamic environments and executing actions present a broader attack surface and new categories of risk. These include performing unsafe operations, violating privacy constraints through improper data handling or access control failures, deviating from user objectives (task misalignment), and susceptibility to novel manipulation techniques like indirect prompt injection and memory poisoning. Ensuring the trustworthy operation of these powerful agents is paramount, especially as they are integrated into high-stakes applications. To address this critical challenge, we introduce VeriGuard, a novel framework designed to enhance the safety and reliability of LLM agents by interactively verifying their policies and the actions. VeriGuard integrates a verification module that intercepts code-based actions proposed by the agent. In the first step, VeriGuard will generates and verifies the policies. The policies are rigorously checked against a set of predefined safety and security specifications Then each action will be verified to make sure it will align with the agent specification. This interactive verification loop ensures that the agent's behavior remains within safe operational bounds, effectively preventing the execution of harmful or unintended operations. By verifying each step, VeriGuard provides a robust safeguard, substantially improving the trustworthiness of LLM agents in complex, real-world environments. View details
    Differential Sensitivity of Impedance Plethysmography and Photoplethysmography Sensors to Temperature-Induced Peripheral Vasoconstriction
    Seobin Jung
    Alexandros Pantelopoulos
    Lindsey Sunden
    Pete Richards
    Shwetak Patel
    Sam Sheng
    Scientific Reports (2026)
    Preview abstract Impedance plethysmography (IPG) and photoplethysmography (PPG) are non-invasive techniques for measuring blood volume changes. This study investigated the differential responses of IPG and PPG to temperature-mediated vasoconstriction induced by localized cooling. Twenty-one participants underwent control and treatment conditions, with fake or real ice cubes applied to the forearm. Blood pressure remained stable, while heart rate decreased. PPG signal amplitude significantly decreased with cooling (p_adj = 0.004), indicating sensitivity to superficial blood flow changes. In contrast, IPG signal amplitude remained stable (p_adj = 1.0). No statistically significant differences were observed in timing-derived metrics. These findings suggest IPG is less sensitive to superficial changes in blood flow than PPG, and may be more suitable for monitoring deeper blood flow. This study provides insights into the distinct sensitivities of IPG and PPG, with implications for wearable device development and cardiovascular monitoring. View details
    Neural general circulation models for modeling precipitation
    Stephan Hoyer
    Dmitrii Kochkov
    Janni Yuval
    Ian Langmore
    Science Advances (2026)
    Preview abstract Climate models struggle to accurately simulate precipitation, particularly extremes and the diurnal cycle. While hybrid models combining machine learning and physics have emerged with the premise of improving precipitation simulations, none have proven sufficiently skillful or stable enough to outperform existing models in simulating precipitation. Here, we present the first hybrid model that is trained directly on precipitation observations. The model runs at 2.8 degrees resolution and is built on the differentiable NeuralGCM framework. This model is stable for decadal simulations and demonstrates significant improvements over existing GCMs, ERA5 reanalysis, and a Global Cloud-Resolving Model in simulating precipitation. Our approach yields reduced biases, a more realistic precipitation distribution, improved representation of extremes, and a more accurate diurnal cycle. Furthermore, it outperforms the ECMWF ensemble for mid-range weather forecasting. This advance paves the way for more reliable simulations of current climate and for the ability to fully utilize the abundance of existing observations to further improve GCMs. View details
    Preview abstract Browser fingerprinting is the practice of tracking users across the Web by collecting attributes from their devices and combining them to create unique identifiers. This practice poses major privacy risks to users, and more than a decade of research has quantified fingerprinting risks due to various attributes, leading browser developers to implement many privacy-enhancing changes. Early work used Shannon entropy to quantify risks. However, Shannon entropy can grow with dataset size, limiting the ability to compare datasets and results. Researchers then introduced normalized entropy as a measure for comparing browser fingerprinting datasets of different sizes and numerous works followed using normalized entropy for this purpose. We identify and address a resulting problem in the fingerprinting literature. We show normalized entropy is ill-suited to compare datasets of different sizes — it decreases as dataset size increases. We show this both analytically and empirically, leveraging a recently published dataset of browser attributes commonly used for fingerprinting. Given the unmet need for a better fingerprinting risk measure, we define a minimal set of desired properties for such a measure: scale-invariance, monotonicity and estimability. We then propose to use Tsallis entropy as a more interpretable fingerprinting risk measure. We evaluate Shannon, normalized, and Tsallis entropy with respect to the properties, and prove that only Tsallis entropy satisfies all of them. View details
    LiveSVG: Zero-Shot SVG Animation via Video Generation
    Matan Levy
    Ran Margolin
    Bar Cavia
    Dvir Samuel
    Shmuel Peleg
    Alex Rav Acha
    Arik Shamir
    Dani Lischinski
    Google (2026)
    Preview abstract We introduce LiveSVG, a zero-shot approach for generating Scalable Vector Graphics (SVG) animations using video diffusion models. Current SVG animation methods struggle with complex motions: LLM-based code synthesis fails to express fine, non-rigid Bézier deformations, while Score Distillation Sampling (SDS) provides noisy gradients and often requires category-specific priors like skeletons. In contrast, LiveSVG fits vector geometry directly to an explicitly generated target video. Given an input SVG image and a motion prompt, we generate a previewable target video using a frozen image-to-video model, then fit the original SVG to this video via differentiable rendering. Our fitting stage is skeleton-free, utilizing a dual-level motion representation that combines per-group homographies for coarse articulation with per-path Bézier control-point offsets for local deformations. To resolve color-induced correspondence ambiguities during pixel-wise fitting, we introduce a novel sphere-packing recolorization strategy. We also present ChallengeSVG, a benchmark of complex, multi-object scenes that exposes the limitations of prior work. Evaluations demonstrate that LiveSVG significantly outperforms existing methods on both AniClipart and ChallengeSVG, establishing direct reference-video fitting as a practical, robust route to prompt-aligned and fully editable vector animation. View details
    An Empirical Study of Tablet Ergonomics: The Interplay of Temperature, Orientation, and Use Behaviors
    Carmen Van Ommen
    Mikki Phan
    Arun Raghupathy
    Daniel Huynh
    Barbara Chaparro
    Ergonomics in Design: The Quarterly of Human Factors Applications Journal (2026)
    Preview abstract To balance computational performance with thermal comfort, this study explores a consolidated hotspot architecture at the top center of a tablet. We tested hotspot (39°C, 43°C, 45°C, 47°C) and ambient temperatures (25°C, 35°C) with 60 participants, measuring perception, action likelihood, and expectation. The hotspot was observed away from high contact areas, with 43°C identified as the threshold for significant discomfort. Discomfort increased with portrait mode use and higher device and ambient temperatures, while active use duration influenced acceptability. The findings underscore the importance of thermal mapping and contextual sensing, with direct applications for software throttling thresholds of coated aluminum enclosures. View details
    A Dynamic Numerical Model for Real-Time Estimation of Latent Cognitive States Using Oculomotor Metrics
    Diako Mardanbegi
    ICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 22087-22091
    Preview abstract Estimating internal cognitive states from oculomotor data is fundamentally challenging due to their context-dependency and the complex relationship between various metrics. This paper proposes a dynamic numerical framework to model a task-specific behavioral signature and monitor deviations from it in real-time. The model integrates key oculomotor and motion metrics into a differential equation, yielding a continuous score that serves as an estimate of a latent cognitive state. By using tunable, heuristic parameters, our approach offers a transparent and adaptable alternative to opaque machine learning models. The framework’s strength lies in its ability to pinpoint objective changes in behavior, providing a potential tool for interpreting events like the onset of fatigue, distraction, or cognitive load. View details
    Preview abstract There are growing concerns about AI-generated image-based sexual abuse (AI-IBSA), also known as nonconsensual sexualized ′deepfakes.′ Empirical research on AI-IBSA, however, remains very limited. This study surveyed 7231 respondents across Australia, the United Kingdom, and the United States to investigate community attitudes and perceptions on AI-IBSA. Through a vignette study, we explored the relationship between public familiarity with AI-IBSA, normative concerns about consent, and context-dependent judgments that vary based on the target's identity relational status, and how the content was used. Our findings reveal strong condemnation of AI-IBSA, yet respondents demonstrated low familiarity with the technology and their views varied depending on particular contexts. AI-IBSA targeting intimate partners was viewed as more unacceptable than targeting celebrities, and content created solely for personal use was seen as less unacceptable than content intended for distribution. The study highlights the need for approaches that go beyond technical fixes and punitive measures, advocating for a multifaceted response that integrates ethical data governance, digital sexual literacy, and restorative justice approaches. View details
    Improved Differentially Private Algorithms for Rank Aggregation
    Phanu Vajanopath
    Quentin Hillebrand
    Vorapong Suppakitpaisarn
    AAAI (2026)
    Preview abstract Rank aggregation is a task of combining the rankings of items from multiple users into a single ranking that best represents the users' rankings. Alabi et al. (AAAI'22) presents differentially-private (DP) polynomial-time approximation schemes (PTASes) and 5-approximation algorithms with certain additive errors for the Kemeny rank aggregation problem in both central and local models. In this paper, we present improved DP PTASes with smaller additive error in the central model. Furthermore, we are first to study the footrule rank aggregation problem under DP. We give a near-optimal algorithm for this problem; as a corollary, this leads to 2-approximation algorithms with the same additive error as the 5-approximation algorithms of Alabi et al. for the Kemeny rank aggregation problem in both central and local models. View details
    Preview abstract Generative AI’s humanlike qualities are driving its rapid adoption in professional domains. However, this anthropomorphic appeal raises concerns from HCI and responsible AI scholars about potential hazards and harms, such as overtrust in system outputs. To investigate how technology workers navigate these humanlike qualities and anticipate emergent harms, we conducted focus groups with 30 professionals across six job functions (ML engineering, product policy, UX research and design, product management, technology writing, and communications). Our findings reveal an unsettled knowledge environment surrounding humanlike generative AI, where workers’ varying perspectives illuminate a range of potential risks for individuals, knowledge work fields, and society. We argue that workers require comprehensive support, including clearer conceptions of “humanlikeness” to effectively mitigate these risks. To aid in mitigation strategies, we provide a conceptual map articulating the identified hazards and their connection to conflated notions of “humanlikeness.” View details
    Preview abstract The current pursuit of robust machine intelligence is largely predicated on a substrate independent, computational functionalist view of cognition, where sufficiently complex computational processing is expected to eventually yield generalized reasoning. This paper explores the ontological distinctions between these computational frameworks and biological cognition, specifically how these differences impact the capacity for semantic understanding. By analyzing phenomena such as the "reversal curse" where models fail to generalize the symmetry in identity relations (A=B implies B=A), and performance on novel reasoning benchmarks (e.g., ARC-AGI), this paper examines whether current model limitations are transient artifacts of scale or indicative of a distinct architectural category. Integrating Stevan Harnad’s “symbol grounding problem” with Evan Thompson’s biological model of “intrinsic normativity,” I investigate whether robust general intelligence might require sense-making: a process distinct from information processing, whereby an agent’s internal states are causally coupled with its environment via survival or system-wide stakes which grounds symbols in meaning. Current Large Language Models (LLMs) appear to lack this intrinsic normativity, and consequently may operate primarily as epistemic instruments rather than ontic agents. By introducing the concept of “ontic grounding”, this paper presents a potential framework for distinguishing between the simulation of reasoning and true understanding, which could have implications for AI safety and governance. View details
    Preview abstract Mid-air gestures in Extended Reality (XR) often lead to fatigue, discomfort and imprecision, limiting their suitability for extended use. Surface-based interactions offer a compelling alternative, providing improved accuracy, speed, and comfort. However, current egocentric vision-based methods struggle with reliable surface inputs due to challenges in hand tracking and surface-plane estimation from oblique and occluded viewing angles. To this extent, we introduce SurfaceXR, a novel sensor fusion approach that combines headset based hand tracking with micro-vibration data sampled from commodity smartwatch IMUs to enable precise and robust inputs on arbitrary surfaces. Our system is designed with flexibility in mind - it can function using only hand tracking, only IMU sensing, or optimally with both modalities combined. Our user study across 12 participants validates SurfaceXR's effectiveness in augmenting surface touch tracking and 8 class hand-surface gesture recognition, demonstrating significant improvements over single-modality approaches. Enabled by SurfaceXR, we demonstrate a series of interactive apps for both AR and VR, ranging from on-surface sketching, text entry and gesture based navigation. View details
    Preview abstract This defensive publication describes a framework for multi-artificial intelligence (AI) orchestration that can be used to address potential limitations associated with reliance on single AI models, such as correlated systemic failures or cognitive blind spots. The described system is a cognitive orchestration framework that can function as a middleware layer to manage tasks across a heterogeneous ensemble of AI models. An orchestrator node can decompose a user request into a sequence of sub-tasks, which an arbitrage engine may then dynamically assign to suitable AI models based on certain factors, such as capability, cost, and latency. For certain tasks, such as those designated as high-risk, a byzantine consensus layer can route the task to multiple diverse models in parallel and may trigger a process, for example a 'cognitive debate,' which could be adjudicated by a third-party judge model to help resolve conflicting outputs. This framework can facilitate a more resilient system that may improve the accuracy and reliability of outputs when compared to some single-model architectures. View details
    Preview abstract Being able to understand the security and privacy (S&P) concerns of IoT users brings benefits to both developers and users. To learn about users' views, we examine Amazon IoT reviews - one of the biggest IoT markets. This work presents a state-of-the-art methodology to identify and categorize reviews in which users express S&P concerns. We developed an automated pipeline by fine-tuning GPT-3.5-Turbo to build two models: the Classifier-Rationalizer-Categorizer and the Thematic Mapper. By leveraging dynamic few-shot prompting and the model's large context size, our pipeline achieved over 97% precision and recall, significantly outperforming keyword-based and classical ML methods. We applied our pipeline to 91K Amazon reviews about fitness trackers, smart speakers and cameras, over multiple years. We found that on average 5% contained S&P concerns, while security camera exhibited the highest prevalence at 10%. Our method detected significantly more S&P-relevant reviews than prior works: 15x more for fitness trackers, 29% more for smart speakers, and 70% more for cameras. Our longitudinal analysis reveals that concerns like surveillance and data control have persisted for years, suggesting limited industry progress. We demonstrate that across all device types, users consistently demand more precise control over what data is collected and shared. We uncover challenges in multi-user and multi-device interactions, identifying two previously unreported themes concerning inadequate controls for account separation and data access. These findings, ranging from broad persistent trends to specific instances of customer loss, offer actionable insights for developers to improve user satisfaction and trust. View details
    ×