Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 11151 publications
Preview abstract
Generative AI is reshaping software development, yet its psychological impact remains under-researched. During May and August 2025 we conducted reflexive thematic analysis of interviews with 12 senior engineers (≥5 years experience) recruited from Western technology hubs to explore shifts in professional identity. We identify a central transition from "coder to conductor," where AI acts as a cognitive partner. Key findings include: (1) a re-architecting of focus from implementation to strategy; (2) a shift in productivity metrics from output to impact; and (3) a dual-impact on agency, where AI empowers autonomy but threatens competence through de-skilling anxieties. These findings suggest that as implementation becomes commoditised, organisational training and career progression must prioritise architectural mastery and metacognitive oversight to ensure sustained developer motivation and system integrity.
View details
Usability Hasn’t Peaked: Exploring How Expressive Design Overcomes the Usability Plateau
Alyssa Sheehan
Bianca Gallardo
Ying Wang
Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI ’26), April 13–17, 2026, Barcelona, Spain (2026)
Preview abstract
Critics have argued that mobile usability has largely been optimized, and that only incremental gains are possible. We set out to explore if the newest generation of design systems, which promote greater flexibility and a return to design basics, could produce substantially more usable designs while maintaining or increasing aesthetic judgments. Through a study with 48 diverse participants completing tasks in 10 different applications, we found that in designs created following Material 3 Expressive guidelines, users fixated on the correct screen element for a task 33% faster, completed tasks 20% faster, and rated experiences more positively compared to versions designed using the previous Material design system. These improvements in performance and aesthetic ratings challenge the premise of a usability plateau and show that mobile usability has not peaked. We illustrate specific opportunities to make mobile experiences more usable by returning to design fundamentals while highlighting risks of added flexibility.
View details
SNPeek: Side-Channel Analysis for Privacy Applications on Confidential VMs
Ruiyi Zhang
Albert Cheu
Adria Gascon
Michael Schwarz
Octavian Suciu
Network and Distributed System Security (NDSS) (2026)
Preview abstract
Confidential virtual machines (CVMs) based on trusted execution environments (TEEs) enable new privacy-preserving solutions. But CVMs are not a privacy panacea, as they are vulnerable to side-channel attacks that may compromise confidentially of workloads.
In this work, we develop the FARFETCH’D framework to help developers evaluate side-channel assisted privacy attacks that are broadly applicable to CVMs. The privacy reduction due to these attacks heavily depend on the execution environment and the workload, which varies vastly:What are avail-able attack primitives? How does the particular privacy work-load behave?This makes manual investigation and efficiently mitigating software-based side channels a cumbersome and impossible task. FARFETCH’D solves this challenge by providing a set of configurable attack primitives that can execute on real CVM hardware and automated ML-based analysis pipelines. We evaluate the effectiveness of FARFETCH’D on privacy-preserving workloads. Our results show that our approach is effective at pinpointing the vulnerability of privacy apps against side channels and help evaluating mitigation based on oblivious memory and differential privacy.
View details
CrossCheck: Input Validation for WAN Control Systems
Rishabh Iyer
Isaac Keslassy
Sylvia Ratnasamy
Networked Systems Design and Implementation (NSDI) (2026) (to appear)
Preview abstract
We present CrossCheck, a system that validates inputs to the Software-Defined Networking (SDN) controller in a Wide Area Network (WAN). By detecting incorrect inputs—often stemming from bugs in the SDN control infrastructure—CrossCheck alerts operators before they trigger network outages.
Our analysis at a large-scale WAN operator identifies invalid inputs as a leading cause of major outages, and we show how CrossCheck would have prevented those incidents. We deployed CrossCheck as a shadow validation system for four weeks in a production WAN, during which it accurately detected the single incident of invalid inputs that occurred while sustaining a 0% false positive rate under normal operation, hence imposing little additional burden on operators. In addition, we show through simulation that CrossCheck reliably detects a wide range of invalid inputs (e.g., detecting demand perturbations as small as 5% with 100% accuracy) and maintains a near-zero false positive rate for realistic levels of noisy, missing, or buggy telemetry data (e.g., sustaining zero false positives with up to 30% of corrupted telemetry data).
View details
On-the-Fly OVD Adaptation with FLAME: Few-shot Localization via Active Marginal-Samples Exploration
Yehonathan Refael
Amit Aides
Aviad Barzilai
Vered Silverman
Bolous Jaber
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops (2026), pp. 886-894
Preview abstract
Open-vocabulary object detection (OVD) models offer remarkable flexibility applications by enabling object detection from arbitrary text queries. Still, the zero-shot performance of the pre-trained models is hampered by the inherent semantic ambiguity of natural language, result to low precision, leading to insufficient crucial downstream applications. For instance, in the remote sensing (RS) domain, a query for "ship" can yield varied and contextually irrelevant results. To address this, for real time applications, we propose a novel cascaded architecture that synergizes the broad capabilities of a large, pre-trained OVD model with a lightweight, few-shot classifier. Our approach utilizes the frozen weights of the zero-shot model to generate initial, high-recall object-embedding proposals, which are then refined by a compact classifier trained in real-time on a handful of user-annotated examples. The core of our contribution is an efficient one step active learning strategy for selecting the most informative samples for user annotation. Our method identifies (extremely) small amount of an uncertain candidates near the theoretical decision boundary using density estimation and then applies clustering to ensure a diverse training set. This targeted sampling enables our cascaded system to elevate performance on standard remote sensing benchmarks. Our work thus presents a practical and resource-efficient framework for adapting foundational models to specific user needs, drastically reducing annotation overhead while achieving high accuracy without costly full-model fine-tuning.
View details
A probabilistic framework for learning non‐intrusive corrections to long‐time climate simulations from short‐time training data
Benedikt Barthel
Rob Carver
Fei Sha
Themistoklis Sapsis
Journal of Advances in Modeling Earth Systems (2026)
Preview abstract
Despite advances in high performance computing, accurate numerical simulations of global atmospheric dynamics remain a challenge. The resolution required to fully resolve the vast range scales as well as the strong coupling with—often not fully-understood—physics renders such simulations computationally infeasible over time horizons relevant for long-term climate risk assessment. While data-driven parameterizations have shown some promise of alleviating these obstacles, the scarcity of high-quality training data and their lack of long-term stability typically hinders their ability to capture the risk of rare extreme events. In this work we present a general strategy for training variational (probabilistic) neural network models to non-intrusively correct under-resolved long-time simulations of turbulent climate systems. The approach is based on the paradigm introduced by Barthel Sorensen et al. (2024, https://doi.org/10.1029/2023ms004122) which involves training a post-processing correction operator on under-resolved simulations nudged toward a high-fidelity reference. Our variational framework enables us to learn the dynamics of the underlying system from very little training data and thus drastically improve the extrapolation capabilities of the previous deterministic state-of-the art—even when the statistics of that training data are far from converged. We investigate and compare three recently introduced variational network architectures and illustrate the benefits of our approach on an anisotropic quasi-geostrophic flow. For this prototype model our approach is able to not only accurately capture global statistics, but also the anistropic regional variation and the statistics of multiple extreme event metrics—demonstrating significant improvement over previously introduced deterministic architectures.
View details
Unveiling the Global Landscape of Android Security Updates
Haiyun Deng
Abbas Acar
Esteban Luques
Harun Oz
Ahmet Aris
Selcuk Uluagac
IEEE Transactions on Dependable and Secure Computing (2026)
Preview abstract
Android is the world’s leading mobile operating
system, with over three billion active devices. Detecting vulnerabilities and ensuring timely patch deployment are critical to
maintaining security. The Android Open Source Project (AOSP)
has enhanced the transparency of security updates through Security Patch Levels. However, challenges related to update speed
and availability persist. In 2022, Google reported that half of the
zero-day vulnerabilities discovered in the wild were variations of
vulnerabilities that had already been patched. Recent research
mainly highlights delays in update distribution, often attributing
them to fragmentation and focusing primarily on flagship devices
or limited time-frames. Our approach takes a device-centric
perspective to investigate Android update patterns, analyzing
567K security update records from 2014 to 2024, covering 904
distinct devices from six key Original Equipment Manufacturers
(OEMs) across 98 countries. Our extensive analysis revealed
notable differences in update release timing across OEMs, device types, and regions. Our study also examines documented
vulnerabilities and weaknesses, while assessing OEM compliance
with Android security guidelines. Our study shows that ∼89.7%
of vulnerabilities on unpatched Android devices are exploitable
without user interaction and with low attack complexity. We
also identified delays linked to fragmentation and OEM-specific
challenges, and provide actionable insights for improvement.
View details
An experimental evaluation of an AI-powered interactive learning platform
Nicole Miller
Yael Haramaty
Lidan Hackmon
Lior Belinsky
Abraham Oritz Tapia
Lucy Tootill
Scott Siebert
Frontiers in Artificial Intelligence (2026) (to appear)
Preview abstract
Generative AI, which is capable of transforming static content into dynamic learning experiences, holds the potential to revolutionize student engagement in educational contexts. However, questions still remain around whether or not these tools are effective at facilitating student learning. In this research, we test the effectiveness of an AI-powered platform incorporating multiple representations and assessment through Learn Your Way, an experimental research platform that transforms textbook chapters into dynamic visual and audio representations. Through a between-subjects, mixed methods experiment with 60 US-based students, we demonstrate that students who used Learn Your Way had a more positive learning experience and had better learning outcomes compared to students learning the same content through a digital textbook. These findings indicate that AI-driven tools, capable of providing choice among interactive representations of content, constitute an effective and promising method for enhancing student learning.
View details
Preview abstract
For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step.
View details
Type-Aware Ranking of Urban Similarity from Aerial Imagery
Idan Kligvasser
Yotam Intrator
Yuval Desheh
Aviad Barzilai
Niv Efron
Ehud Rivlin
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops (2026), pp. 821-829
Preview abstract
Estimating and ranking cross-city similarity from aerial imagery is a fundamental challenge in remote sensing and geospatial representation learning. Urban environments differ widely in road layout, marking conventions, and infrastructure design, yet standard visual representations often struggle to disentangle these meaningful structural variations from superficial appearances. In this work, we propose a type-aware contrastive learning framework that measures urban similarity by explicitly modeling distinct infrastructure elements. Leveraging open-vocabulary retrieval, we construct a globally diverse dataset of road-related features, such as intersections, crosswalks, and bus lanes, and train a type-conditioned Vision Transformer that fuses visual features with CLIP-derived semantic embeddings. Crucially, we introduce an adaptive per-type contrastive loss that dynamically emphasizes infrastructure categories with high discriminative power while down-weighting less informative types. To quantify city-level similarity, we aggregate per-type cosine similarities via a lightweight classifier to generate a global city-to-city similarity matrix. Experiments demonstrate that this type-aware approach significantly improves clustering quality and successfully generalizes to unseen cities, establishing a scalable, interpretable foundation for comparative urban analysis.
View details
Silicon-Level Sovereignty: Root of Trust in AI Accelerators (Digital Trust & Policy)
https://www.dotmagazine.online (2026)
Preview abstract
As artificial intelligence (AI) transitions from experimental pilot programs to mission-critical enterprise operations, traditional software-based security frameworks are proving insufficient against sophisticated infrastructure-level threats. This article introduces the concept of Silicon-Level Sovereignty, a first-principles approach to digital trust that anchors security in the physical hardware rather than the software stack.
We examine the technical architecture of Hardware Root of Trust (RoT), specifically focusing on the roles of Trusted Platform Modules (TPMs) and Secure Enclaves in modern AI accelerators such as GPUs and TPUs. By leveraging cryptographic remote attestation, organizations can move from a model of assumed software integrity to one of verifiable hardware-level proof.
The discussion provides a comparative analysis of industry-leading implementations, including NVIDIA’s Hopper architecture [1, 2], Google’s Titan-backed TPU v5p [3, 4], and Microsoft’s Azure Boost Cerberus system [5, 6], alongside the cluster-scale trust challenges presented by ultra-large systems like xAI’s Colossus [7].
The article concludes that Silicon-Level Sovereignty is no longer an optional security feature but a foundational requirement for establishing the integrity, privacy, and multi-tenant isolation necessary for high-stakes AI workloads.
View details
FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration
Victor May
Diganta Misra
Yanqi Luo
Anjali Sridhar
Justine Gehring
Silvio Soares Ribeiro Junior
2026
Preview abstract
AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization.
View details
Preview abstract
This disclosure describes systems and methods for a multi-agent framework that can automate and scale cognitive work. The framework can, for example, use a cognitive assembly line of specialized computational agents to perform tasks such as research and drafting. A beneficial component could be an adversarial review panel (ARP), which is a multi-agent review system where distinct agent personas critique a generated draft from varied perspectives. The structured feedback from the ARP can be used to automatically iterate on and refine the work product. This approach can improve the intellectual rigor of generated content and reduce the time required for production, which may allow human operators to focus on activities such as strategic oversight and final validation.
View details
Preview abstract
Responsive user interfaces enable dynamically adjusting user interfaces based on device-specific aspects such as screen size, aspect ratio, display resolution, etc. However, traditional responsive design fails to account for different types of constraints of a user and task criticality of the task being performed via the UI. Misalignment between the UI design, user context and task criticality can lead to user error. This disclosure describes techniques, implemented with user permission, for dynamically modifying the layout, information density, and/or interactive physics of a user interface based on a dual-factor analysis of user cognitive state and task criticality. The user's cognitive state can be inferred from behavioral telematics. Task criticality can be inferred from semantic analysis. The information density and other parameters of a user interface are automatically adjusted based on such analyses. Such adjustments include applying or relaxing restrictions on interactivity and adjusting visual prominence of various UI elements to adjust the information density of the user interface. The adjustments can also include adjusting friction as appropriate, hiding certain aspects of the user interface, or other types of adjustments.
View details
VISTA: A Test-Time Self-Improving Video Generation Agent
Hootan Nakhost
Xuan Long Do
The IEEE/CVF Conference on Computer Vision and Pattern Recognition (to appear) (2026)
Preview abstract
Despite rapid advances in text-to-video (T2V) synthesis, generated video quality remains critically dependent on precise user prompts. Existing test-time optimization methods, successful in other domains, struggle with the multi-faceted nature of video. To address this, we introduce VISTA, a novel multi-agent system that autonomously refines prompts to improve video generation. VISTA operates in an iterative loop, first decomposing a user's idea into a structured temporal plan. After generation, the best video is identified through a robust pairwise tournament. This winning video is then critiqued by a trio of specialized agents focusing on visual, audio, and contextual fidelity. Finally, a reasoning agent synthesizes this feedback to introspectively rewrite and enhance the prompt for the next generation cycle. To rigorously evaluate our proposed approach, we introduce MovieGen-Bench, a new benchmark of diverse single- and multi-scene video generation tasks. Experiments show that while prior methods yield inconsistent gains, VISTA consistently improves video quality, achieving up to 60% pairwise win rate against state-of-the-art baselines. Human evaluators concur, preferring VISTA's outputs in 68% of comparisons.
View details