Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 11218 publications
Preview abstract
How many T gates are needed to approximate an arbitrary n-qubit quantum state to within
a given precision ϵ? Improving prior work of Low, Kliuchnikov and Schaeffer, we show that the
optimal asymptotic scaling is Θ(sqrt{2^n log(1/ε)} + log(1/ε)) if we allow an unlimited number of ancilla qubits. We also show that this is the optimal T-count for implementing an arbitrary
diagonal n-qubit unitary to within error ϵ. We describe an application to batched synthesis of
single-qubit unitaries: we can approximate a tensor product of m = O(log log(1/ϵ)) arbitrary
single-qubit unitaries to within error ϵ with the same asymptotic T-count as is required to
approximate just one single-qubit unitary.
View details
Agentic AI Infrastructure in Practice: Learn These Key Hurdles to Deploy Production AI Agents Efficiently
https://swisscognitive.ch/ (2026)
Preview abstract
The emergence of Agentic AI—autonomous systems capable of reasoning, decision-making, and multi-step execution—represents a paradigm shift in enterprise technology. Moving beyond simple generative tasks, these agents offer the potential to solve long-standing industry pain points, with over 90% of enterprises planning integration within the next three years. However, the transition from successful proof-of-concept (PoC) to a resilient, production-grade system presents significant hurdles.
This article categorizes these challenges into three primary domains:
Technical and Engineering Hurdles: Issues such as "entangled workflows" that complicate debugging, the struggle to maintain output quality and mitigate hallucinations, and the unpredictability caused by shifting underlying models or data sources.
People, Process, and Ecosystem Hurdles: The high operational costs and unclear ROI of large models, the necessity of a new "Agent Ops" skillset, the complexity of integrating agents with disparate enterprise systems, and a rapidly evolving regulatory landscape.
The Pace of Change and Security risks: The technical debt incurred by shifting software frameworks and the expanded attack surface created by autonomous agents.
The article concludes that successful deployment requires a shift from informal "vibe-testing" to rigorous engineering discipline. By adopting code-first frameworks, establishing robust evaluation metrics (KPIs), and prioritizing functional deployment over theoretical optimization, organizations can effectively manage the lifecycle of Agentic AI and realize its transformative business value.
View details
Expert evaluation of LLM world models: A high-Tc superconductivity case study
Haoyu Guo
Maria Tikhanovskaya
Paul Raccuglia
Alexey Vlaskin
Chris Co
Scott Ellsworth
Matthew Abraham
Lizzie Dorfman
Peter Armitage
Chunhan Feng
Antoine Georges
Olivier Gingras
Dominik Kiese
Steve Kivelson
Vadim Oganesyan
Brad Ramshaw
Subir Sachdev
Senthil Todadri
John Tranquada
Eun-Ah Kim
Proceedings of the National Academy of Sciences (2026)
Preview abstract
Large Language Models (LLMs) show great promise as a powerful tool for scientific literature exploration. However, their effectiveness in providing scientifically accurate and comprehensive answers to complex questions within specialized domains remains an active area of research. This work evaluates the performance of six different LLM-based systems for answering scientific literature questions, including commercially available closed models and a custom retrieval-augmented generation (RAG) system capable of retrieving images alongside text. We conduct a rigorous expert evaluation of the systems in the domain of high-temperature cuprate superconductors, a research area that involves material science, experimental physics, computation, and theoretical physics. We use an expert-curated database of 1726 scientific papers and a set of 67 expert-formulated questions. The evaluation employs a multi-faceted rubric assessing balanced perspectives, factual comprehensiveness, succinctness, evidentiary support, and image relevance. Our results demonstrate that RAG-based systems, powered by curated data and multimodal retrieval, outperform existing closed models across key metrics, particularly in providing comprehensive and well-supported answers, and in retrieving relevant visual information. This study provides valuable insights into designing and evaluating specialized scientific literature understanding systems, particularly with expert involvement, while also highlighting the importance of rich, domain-specific data in such systems.
View details
Unveiling the Global Landscape of Android Security Updates
Haiyun Deng
Abbas Acar
Esteban Luques
Harun Oz
Ahmet Aris
Selcuk Uluagac
IEEE Transactions on Dependable and Secure Computing (2026)
Preview abstract
Android is the world’s leading mobile operating
system, with over three billion active devices. Detecting vulnerabilities and ensuring timely patch deployment are critical to
maintaining security. The Android Open Source Project (AOSP)
has enhanced the transparency of security updates through Security Patch Levels. However, challenges related to update speed
and availability persist. In 2022, Google reported that half of the
zero-day vulnerabilities discovered in the wild were variations of
vulnerabilities that had already been patched. Recent research
mainly highlights delays in update distribution, often attributing
them to fragmentation and focusing primarily on flagship devices
or limited time-frames. Our approach takes a device-centric
perspective to investigate Android update patterns, analyzing
567K security update records from 2014 to 2024, covering 904
distinct devices from six key Original Equipment Manufacturers
(OEMs) across 98 countries. Our extensive analysis revealed
notable differences in update release timing across OEMs, device types, and regions. Our study also examines documented
vulnerabilities and weaknesses, while assessing OEM compliance
with Android security guidelines. Our study shows that ∼89.7%
of vulnerabilities on unpatched Android devices are exploitable
without user interaction and with low attack complexity. We
also identified delays linked to fragmentation and OEM-specific
challenges, and provide actionable insights for improvement.
View details
Preview abstract
We prove the following asymptotically tight lower bound for k-color discrepancy: For any k ≥ 2, there exists a hypergraph with n vertices such that its k-color discrepancy is at least Ω(√n). This improves on the previously known lower bound of Ω(√n/ log k) due to Caragiannis et al. [CLS25]. As an application, we show that our result implies improved lower bounds for group fair division.
View details
Preview abstract
Responsive user interfaces enable dynamically adjusting user interfaces based on device-specific aspects such as screen size, aspect ratio, display resolution, etc. However, traditional responsive design fails to account for different types of constraints of a user and task criticality of the task being performed via the UI. Misalignment between the UI design, user context and task criticality can lead to user error. This disclosure describes techniques, implemented with user permission, for dynamically modifying the layout, information density, and/or interactive physics of a user interface based on a dual-factor analysis of user cognitive state and task criticality. The user's cognitive state can be inferred from behavioral telematics. Task criticality can be inferred from semantic analysis. The information density and other parameters of a user interface are automatically adjusted based on such analyses. Such adjustments include applying or relaxing restrictions on interactivity and adjusting visual prominence of various UI elements to adjust the information density of the user interface. The adjustments can also include adjusting friction as appropriate, hiding certain aspects of the user interface, or other types of adjustments.
View details
Phoenix: Rowhammer Attacks on DDR5 with Self-Correcting Synchronization
Preview
Michele Marazzi
Kaveh Razavi
Salman Qazi
Diego Meyer
Patrick Jattke
IEEE Security & Privacy (S&P) (2026)
Preview abstract
Managing compiler build errors that can arise during infrastructure upgrades in large, polyglot codebases may be challenging, as manual remediation can be slow and some automated tools may not support modern language syntax. A system can provide automated error remediation by ingesting compiler diagnostics and analyzing source code using an Abstract Syntax Tree (AST). A recursive scope resolution algorithm, for example, can traverse the AST to identify a specific and narrowly-scoped code block at which to apply an error suppression. Conversely, this algorithmic complexity can be bypassed when lexical scope resolution is not required, and the system can identify the specific location of error suppressions directly from the error's exact coordinates. The system may then generate and apply language-specific patches, such as structured comments for JavaScript source files or line-scoped comments for TypeScript source files, for example, by using a transactional rewrite engine. This approach can provide a scalable method for managing automated code remediation, which may facilitate infrastructure upgrades by reducing the need for manual intervention.
View details
A Computer Vision Problem in Flatland
Erin Connelly
Annalisa Crannell
Timothy Duff
Rekha R. Thomas
SIAM Journal on Applied Algebra and Geometry, 10 (2026), pp. 14-45
Preview abstract
When is it possible to project two sets of labeled points of equal cardinality lying in a pair of projective planes to the same image on a projective line? We give a complete answer to this question, obtaining the following results. We first show that such a pair of projections exist if and only if the two point sets are themselves images of a common point set in projective space. Moreover, we find that for generic pairs of point sets, a common projection exists if and only if their cardinality is at most seven. In these cases, we give an explicit description of the loci of projection centers that enable a common image.
View details
ARM MTE Performance in Practice
Preview
Taehyun Noh
Yingchen Wang
Tal Garfinkel
Mahesh Madhav
Mattan Erez
Shravan Narayan
Usenix Security (2026)
VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation
Mihir Parmar
Dj Dvijotham
Mirko Montanari
ACL 2026
Preview abstract
Artificial intelligence is rapidly evolving, marked by the emergence of Large Language Model (LLM) agents – systems capable of complex reasoning, planning, and interaction with digital and physical environments. These agents, powered by advancements in LLMs, demonstrate remarkable capabilities across diverse domains, including finance, healthcare, web navigation, software development, and daily task assistance. Unlike traditional AI systems, LLM agents can perceive their surroundings, formulate multi-step plans, utilize external tools and APIs, access memory or knowledge bases, and execute actions to achieve specified goals. This ability to act upon the world, however, introduces significant safety and security challenges.
The safety paradigms developed for traditional LLMs, primarily focused on mitigating harmful textual outputs (e.g., toxicity, bias), are insufficient for safeguarding LLM agents. Agents interacting with dynamic environments and executing actions present a broader attack surface and new categories of risk. These include performing unsafe operations, violating privacy constraints through improper data handling or access control failures, deviating from user objectives (task misalignment), and susceptibility to novel manipulation techniques like indirect prompt injection and memory poisoning. Ensuring the trustworthy operation of these powerful agents is paramount, especially as they are integrated into high-stakes applications. To address this critical challenge, we introduce VeriGuard, a novel framework designed to enhance the safety and reliability of LLM agents by interactively verifying their policies and the actions. VeriGuard integrates a verification module that intercepts code-based actions proposed by the agent. In the first step, VeriGuard will generates and verifies the policies. The policies are rigorously checked against a set of predefined safety and security specifications Then each action will be verified to make sure it will align with the agent specification. This interactive verification loop ensures that the agent's behavior remains within safe operational bounds, effectively preventing the execution of harmful or unintended operations. By verifying each step, VeriGuard provides a robust safeguard, substantially improving the trustworthiness of LLM agents in complex, real-world environments.
View details
Preview abstract
We consider a setting where we have a ground set ℳ together with real-valued set functions f₁, … , f_n, and the goal is to partition ℳ into two sets S₁,S₂ such that |f_i(S₁) - f_i(S₂)| is small for every i. Many results in discrepancy theory can be stated in this form with the functions f_i being additive. In this work, we initiate the study of the unstructured case where f_i is not assumed to be additive. We show that even without the additivity assumption, the upper bound remains at most O(√{n log n}).
Our result has implications on the fair allocation of indivisible goods. In particular, we show that a consensus halving up to O(√{n log n}) goods always exists for n agents with monotone utilities. Previously, only an O(n) bound was known for this setting.
View details
Improving Low-Vision Chart Accessibility via On-Cursor Visual Context
Yotam Sechayk
Hennes Rave
Max Radler
Mark Colley
Ariel Shamir
Takeo Igarashi
Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI 26)
Preview abstract
Despite widespread use, charts remain largely inaccessible for Low-Vision Individuals (LVI). Reading charts requires viewing data points within a global context, which is difficult for LVI who may rely on magnification or experience a partial field of vision. We aim to improve exploration by providing visual access to critical context. To inform this, we conducted a formative study with five LVI. We identified four fundamental contextual elements common across chart types: axes, legend, grid lines, and the overview. We propose two pointer-based interaction methods to provide this context: Dynamic Context, a novel focus+context interaction, and Mini-map, which adapts overview+detail principles for LVI. In a study with N=22 LVI, we compared both methods and evaluated their integration to current tools. Our results show that Dynamic Context had significant positive impact on access, usability, and effort reduction; however, worsened visual load. Mini-map strengthened spatial understanding, but was less preferred for this task. We offer design insights to guide the development of future systems that support LVI with visual context while balancing visual load.
View details
Preview abstract
As the ECMAScript specification evolves, industrial-scale JavaScript compilers face the challenge of supporting modern language syntax while maintaining compatibility for diverse execution environments. Traditionally, compilers solve this by running transpilation passes in a monolithic pipeline, where the transpilation passes are chosen to execute strictly based on a target language level. This results in significant computational waste, as compilers perform expensive Abstract Syntax Tree (AST) traversals to lower features that may not exist in the actual input source code. We present a static analysis improvement that conditionally executes transpiler passes based on accurately tracking and dynamically maintaining the exact set of language features seen in the compilation unit throughout the transpilation process. It is implemented in the production Google Closure Compiler. By populating and maintaining a FeatureSet at every JavaScript script-level, it dynamically skips running the unnecessary lowering passes. We detail the architectural safeguards - including strategic pass ordering and dynamic validation of the transpiled code for feature-correctness. Evaluation of this improvement on large-scale production applications produced a considerable reduction in compilation time and saved compute and memory usage.
View details
Type-Aware Ranking of Urban Similarity from Aerial Imagery
Idan Kligvasser
Yotam Intrator
Yuval Desheh
Aviad Barzilai
Niv Efron
Ehud Rivlin
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops (2026), pp. 821-829
Preview abstract
Estimating and ranking cross-city similarity from aerial imagery is a fundamental challenge in remote sensing and geospatial representation learning. Urban environments differ widely in road layout, marking conventions, and infrastructure design, yet standard visual representations often struggle to disentangle these meaningful structural variations from superficial appearances. In this work, we propose a type-aware contrastive learning framework that measures urban similarity by explicitly modeling distinct infrastructure elements. Leveraging open-vocabulary retrieval, we construct a globally diverse dataset of road-related features, such as intersections, crosswalks, and bus lanes, and train a type-conditioned Vision Transformer that fuses visual features with CLIP-derived semantic embeddings. Crucially, we introduce an adaptive per-type contrastive loss that dynamically emphasizes infrastructure categories with high discriminative power while down-weighting less informative types. To quantify city-level similarity, we aggregate per-type cosine similarities via a lightweight classifier to generate a global city-to-city similarity matrix. Experiments demonstrate that this type-aware approach significantly improves clustering quality and successfully generalizes to unseen cities, establishing a scalable, interpretable foundation for comparative urban analysis.
View details