Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10795 publications
    FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration
    Diganta Misra
    Yanqi Luo
    Anjali Sridhar
    Justine Gehring
    Silvio Soares Ribeiro Junior
    2026
    Preview abstract AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization. View details
    Productionizing Quantum Mass Production
    Bill Huggins
    Nathan Wiebe
    arXiv for now (2026) (to appear)
    Preview abstract For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step. View details
    AI as a Catalyst for Educational Equity: Addressing Global Teacher Shortages and Learning Disparities
    International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCERT) (2025)
    Preview abstract The global education system is grappling with a critical shortage of teachers, threatening the achievement of universal quality education. This article examines how artificial intelligence (AI) technologies can revolutionize educational access and equity by addressing these systemic challenges. Through a comprehensive article analysis of AI-enabled solutions, including personalized learning mechanisms, virtual tutoring systems, and intelligent content distribution platforms, the article explores the transformative potential of these technologies in democratizing education. The article investigates the implementation of AI across established educational platforms, examining their effectiveness in providing adaptive learning experiences, breaking down language barriers, and ensuring cultural relevance. The article demonstrates that strategic AI integration can significantly impact learning outcomes while helping to bridge the global teacher shortage gap. The article also addresses critical implementation challenges, providing policy recommendations and resource allocation frameworks for successful AI adoption in education systems worldwide. This article analysis contributes to the growing body of knowledge on educational technology by offering practical insights into how AI can be leveraged to create more inclusive, effective, and accessible learning environments, ultimately advancing the goal of quality education for all. View details
    Unprecedented Insights into Maternal Sleep: A Large-scale Longitudinal Analysis of Real-world Wearable Device Data Before, During, and After Pregnancy
    Nichole Young-Lin
    Conor Heneghan
    Logan Schneider
    Logan Niehaus
    Ariel Haney
    Karla Gleichauf
    Jacqueline Shreibati
    Belen Lafon
    Lancet eBioMedicine (2025)
    Preview abstract Introduction: Current understanding of pregnancy and postpartum sleep is driven by limited lab or self-reported data. Consumer wearable devices may help reveal longitudinal, real-world sleep patterns. Methods: We analyzed de-identified wearable device data from 2,540 users in the United States and Canada who met strict wear-time requirements (≥80% daily usage for ≥80% of the time periods of interest [12 weeks prepregnancy, throughout pregnancy, and 20 weeks immediately postpartum]). We tracked sleep time and staging using Fitbit devices. Results: Compared to prepregnancy, total sleep time (TST) increased from an average of 425.3±43.5 min to a peak of 447.6±47.6 min at gestational week 10 with ongoing declines throughout pregnancy. Time in bed (TIB) followed a similar pattern. Increased light sleep drove the initial TST rise. Deep and REM sleep decreased significantly throughout pregnancy, with maximum reductions of 19.2±13.8 min (p<0.01) and 9.0±19.2 min (p<0.01) respectively by pregnancy end. Sleep efficiency also declined slightly during pregnancy (median drop from 88.3% to 86.8%). After delivery, TIB remained below the prepregnancy baseline by 14.7±45.7 min at one year postpartum and 15.2±47.7 min at 1.5 years postpartum. Conclusion: This unprecedented look at large-scale, real-world sleep and pregnancy patterns revealed a previously unquantified initial increase in sleep followed by decreases in both quantity and quality as pregnancy progresses. Sleep deficits persist for at least 1.5 years postpartum. These quantified trends can assist clinicians and patients in understanding what to expect. View details
    Sequentially Auditing Differential Privacy
    Tomas Gonzalez Lara
    Mateo Dulce
    Aaditya Ramdas
    Monica Ribero
    Annual Conference on Neural Information Processing Systems (NeurIPS) (2025)
    Preview abstract We propose a practical sequential test for auditing differential privacy guarantees of black-box mechanisms. The test processes streams of mechanisms' outputs providing anytime valid inference while controlling Type I error, overcoming the fixed sample size limitation of previous batch auditing methods. Experiments show this test detects violations with sample sizes that are orders of magnitude smaller than existing methods, across diverse realistic mechanisms. Notably, it identifies DP-SGD privacy violations in under one training run, unlike prior methods needing full model training. View details
    Text to 3D Object Generation for Scalable Room Assembly
    Sonia Laguna
    Alberto García García
    Marie-Julie Rakotosaona
    Stylianos Moschoglou
    Leonhard Helminger
    2025
    Preview abstract Modern machine learning models for scene understanding, such as depth estimation and object tracking, rely on large, high-quality datasets that mimic real-world deployment scenarios. To address data scarcity, an end-to-end system for synthetic data generation for scalable, high-quality, and customizable 3D indoor scenes. By integrating text-to-image and multi-view diffusion models with NeRF-based meshing, this system generates high-fidelity 3D assets from text prompts and incorporates them into pre-defined floor plans using a rendering tool, Blender. By incorporating novel loss functions and training strategies into prior existing methods, or method supports on-demand object generation, bridging the domain gap between synthetic and real-world data. This system advances synthetic data’s role in addressing machine learning training limitations, enabling more robust and generalizable models for real-world applications. View details
    Why all roads don't lead to Rome: Representation geometry varies across the human visual cortical hierarchy
    Zahraa Chorghay
    Arna Ghosh
    Shahab Bakhtiari
    Blake Richards
    (2025) (to appear)
    Preview abstract Biological and artificial intelligence systems navigate the fundamental efficiency-robustness tradeoff for optimal encoding, i.e., they must efficiently encode numerous attributes of the input space while also being robust to noise. This challenge is particularly evident in hierarchical processing systems like the human brain. With a view towards understanding how systems navigate the efficiency-robustness tradeoff, we turned to a population geometry framework for analyzing representations in the human visual cortex alongside artificial neural networks (ANNs). In the ventral visual stream, we found general-purpose, scale-free representations characterized by a power law-decaying eigenspectrum in most but not areas. Of note, certain higher-order visual areas did not have scale-free representations, indicating that scale-free geometry is not a universal property of the brain. In parallel, ANNs trained with a self-supervised learning objective also exhibited scale-free geometry, but not after fine-tuning on a specific task. Based on these empirical results and our analytical insights, we posit that a system’s representation geometry is not a universal property and instead depends upon the computational objective. View details
    Preview abstract This IEEE Spectrum article reflects on advocacy for U.S. technological leadership during my Congressional visit through IEEE-USA. Leading an expert group of other distinguished IEEE members, we urged lawmakers to support critical initiatives. Key priorities included sustained funding for federal research institutions like NIST, NASA, and the NSF, reauthorizing the SBIR/STTR programs vital for small business innovation, and passing the CREATE AI Act to democratize AI resources by establishing the National AI Research Resource (NAIRR). We also emphasized strengthening the STEM talent pipeline through the CHIPS and Science Act and expanding high-skilled immigrant visas. We highlighted rapid AI advancements, such as autonomous vehicles, the surge in FDA-approved AI based medical devices, as underscoring the need for these strategic investments and policy actions. The article conveys a sense of urgency, calling for concrete congressional action to ensure the U.S. maintains its technological edge while also sharing my personal experiences. View details
    Preview abstract AI products introduce new privacy challenges. Finding the right privacy solution is central to developing innovative products, especially as AI models increasingly handle user data. In this paper, we propose a framework to reason about privacy in AI, and discuss how Privacy Enhancing Technologies (PETs) enable novel user experiences by reducing privacy risks in the AI development lifecycle. We argue that privacy protections are not inherently at odds with utility; in contrast, we discuss how building privacy into products from the start can create better, more trustworthy experiences for everyone. View details
    Deflating Deflationism: A Critical Perspective on Debunking Arguments Against AI Mentality
    Geoff Keeling
    Alex Grzankowski
    Winnie Street
    Henry Shevlin
    Under Review, Minds and Machines (2025) (to appear)
    Preview abstract Abstract: Many people feel compelled to interpret, describe, and respond to Large Language Models (LLMs) as if they possess inner mental lives similar to our own. Responses to this phenomenon have varied. \textit{Inflationist} views endorse the truth of such ascriptions, granting that at least some attributions of mentality to LLMs are warranted. \textit{Deflationists} instead are more sceptical of these attributions, often cautioning against the risk that anthropomorphic projection may lead to misplaced trust or potentially even confusion about the moral status of LLMs. We advance this debate by assessing two common deflationary arguments against LLM mentality. What we term the \textit{robustness strategy} aims to undercut one justification for believing that LLMs are minded entities by showing that putatively cognitive and humanlike behaviours are not robust, failing to generalise appropriately. What we term the \textit{etiological strategy} undercuts attributions of mentality by challenging naive causal explanations of LLM behaviours, offering alternative causal accounts that weaken the case for mental state attributions. While both strategies offer powerful challenges to full-blown inflationism, we find that neither strategy provides a knock-down case against ascriptions of mentality to LLMs \textit{simpliciter}. With this in mind, we explore two modest forms of inflationism about LLM mentality that permit ascriptions of mentality to LLMs under certain conditions.\textit{ Practical modest inflationism }holds that we can, and perhaps should, mentalise LLMs where it is practical to do so, provided the benefits are weighed against relevant risks including the risk of potentially problematic forms of emotional dependency.\textit{ Metaphysical modest inflationism} holds that we can permissibly attribute those mental states and capacities which can be understood in metaphysically undemanding terms (such as knowledge and belief) while exercising greater caution when attributing more metaphysically demanding mental phenomena such as consciousness. View details
    Thing2Reality: Enabling Spontaneous Creation of 3D Objects from 2D Content using Generative AI in XR Meetings
    Erzhen Hu
    Mingyi Li
    Jungtaek Hong
    Alex Olwal
    Seongkook Heo
    UIST '25: Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology, ACM (2025), 53:1-16
    Preview abstract During remote communication, participants often share both digital and physical content, such as product designs, digital assets, and environments, to enhance mutual understanding. Recent advances in augmented communication have facilitated users to swiftly create and share digital 2D copies of physical objects from video feeds into a shared space. However, conventional 2D representations of digital objects limits spatial referencing in immersive environments. To address this, we propose Thing2Reality, an Extended Reality (XR) meeting platform that facilitates spontaneous discussions of both digital and physical items during remote sessions. With Thing2Reality, users can quickly materialize ideas or objects in immersive environments and share them as conditioned multiview renderings or 3D Gaussians. Thing2Reality enables users to interact with remote objects or discuss concepts in a collaborative manner. Our user studies revealed that the ability to interact with and manipulate 3D representations of objects significantly enhances the efficiency of discussions, with the potential to augment discussion of 2D artifacts. View details
    Preview abstract Continuous Integration (CI) is an essential software development practice that establishes processes to minimize bugs and errors in production. In a similar vein, experimentation of software products is vital for evaluating user satisfaction, quality, performance and other key business metrics. Experimentation allows product owners to evaluate the user impact of changes. This can help make informed decisions regarding feature launches. Experimentation also allows developers to tweak internal processes and algorithms to maximize the impact of new features and changes. Additionally, it can sometimes detect errors not detected by CI. Unlike CI systems, experimentation platforms are meant to closely imitate production and usually run the system under test (SUT) against a large scale of input. Despite this, experimentation platforms have a lot in common with CI systems. The mechanisms for continuously integrating and testing changes can be modified and applied to experimentation platforms. Google Search's experimentation platform started as a command line tool many years ago. Over time, this tool has evolved into a platform that serves the evaluation needs for many of Google's products like Search, Assistant, YouTube, Play, Lens, etc., running thousands of large experiments every day. In this workshop, we will present the evolution of Google Search's experimentation platform and how it was transformed from a simple CLI tool into a platform that works at scale, fulfills continuous experimentation needs and provides many CI-like functionalities to its users. View details
    Preview abstract This note is a follow up to Ref. [Naaman, IEEE TAS 2025], describing how to construct Josephson junction, inductor, and mutual inductance models using components that are available in the Keysight ADS core library. View details
    Spherical dimension
    Bogdan Chornomaz
    Shay Moran
    Tom Waknine
    2025
    Preview abstract We introduce and study the \emph{spherical dimension}, a natural topological relaxation of the VC dimension that unifies several results in learning theory where topology plays a key role in the proofs. The spherical dimension is defined by extending the set of realizable datasets (used to define the VC dimension) to the continuous space of realizable distributions. In this space, a shattered set of size d (in the VC sense) is completed into a continuous object, specifically a d-dimensional sphere of realizable distributions. The spherical dimension is then defined as the dimension of the largest sphere in this space. Thus, the spherical dimension is at least the VC dimension. The spherical dimension serves as a common foundation for leveraging the Borsuk-Ulam theorem and related topological tools. We demonstrate the utility of the spherical dimension in diverse applications, including disambiguations of partial concept classes, reductions from classification to stochastic convex optimization, stability and replicability, and sample compression schemes. Perhaps surprisingly, we show that the open question posed by Alon, Hanneke, Holzman, and Moran (FOCS 2021) of whether there exist non-trivial disambiguations for halfspaces with margin is equivalent to the basic open question of whether the VC and spherical dimensions are finite together. View details
    Preview abstract Creativity in software development is frequently overlooked, specifically in the design of developer tools which often focus on productivity. This is likely because creativity is not always seen as a goal in software engineering; in part, this can be explained by the unique way in which software engineers relate to creativity as centered around reusability rather than novelty. However, creativity is a critical aspect of software engineering, and importantly, there is a clear possibility for AI to impact creativity in both positive or negative ways. In this article, we explore the differences in goals for designing AI tools for productivity compared to creativity and propose strategies to elevate creativity in the software engineering workflow. Specifically, we apply seamful design to AI powered software development to consider the role of seamfulness in software development workflows as a way to support creativity. View details