Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10456 publications
Preview abstract
Mainstream artificial neural network models, such as Deep Neural Networks (DNNs) are computation-heavy and energy-hungry. Weightless Neural Networks (WNNs) are natively built with RAM-based neurons and represent an entirely distinct type of neural network computing compared to DNNs. WNNs are extremely low-latency, low-energy, and suitable for efficient, accurate, edge inference. The WNN approach derives an implicit inspiration from the decoding process observed in the dendritic trees of biological neurons, making neurons based on Random Access Memories (RAMs) and/or Lookup Tables (LUTs) ready-to-deploy neuromorphic digital circuits. Since FPGAs are abundant in LUTs, LUT based WNNs are a natural fit for implementing edge inference in FPGAs.
WNNs has been demonstrated to be an energetically efficient AI model, both in software, as well as in hardware. For instance, the most recent DWN – Differential Weightless Neural Network – model demonstrates up to 135× reduction in energy costs in FPGA implementations compared to other multiplication-free approaches, such as binary neural networks (BNNs) and DiffLogicNet, up to 9% higher accuracy in deployments on constrained devices, and culminate in up to 42.8× reduction in circuit area for ultra-low-cost chip implementations. This tutorial will help participants understand how WNNs work, why WNNs were underdogs for such a long time, and be introduced to the most recent members of the WNN family, such as BTHOWeN , LogicWiSARD, COIN, ULEEN and DWN, and contrast to BNNs and LogicNets.
View details
Preview abstract
Sign Language Translation (SLT) aims to map sign language videos to spoken language text. A common approach leverages gloss annotations as an intermediate representation, decomposing SLT into two sub-tasks: video-to-gloss recognition and gloss-to-text translation. While effective, this paradigm relies on expert-annotated gloss labels, which are costly and increasingly unavailable in many datasets, limiting scalability.
To address this challenge, we propose a gloss-free pseudo gloss generation framework that eliminates the need for human-annotated glosses while preserving the structured intermediate representation. Specifically, we prompt a Large Language Model (LLM) with example text-gloss pairs to extract potential sign-related gloss words from the text by leveraging its in-context learning capability.
To mitigate the inherent misalignment between generated pseudo glosses and sign sequences in the video, we further refine their order by formulating the alignment as a weakly supervised learning problem.
With the reordered pseudo-glosses, additional alignment losses such as CTC can be incorporated to enhance supervision. We train our SLT model—comprising a vision encoder and a translator—under a three-stage pipeline, effectively bridging the gap between sign and spoken language.
Despite its simplicity, our approach outperforms previous state-of-the-art gloss-free frameworks across three SLT benchmarks and achieves competitive results with gloss-based methods.
View details
Preview abstract
Google has a long tradition of open-source software, which encompasses the field of operations research with OR-Tools. In development since 2008, it offers several solvers useful to many OR practitioners:
- PDLP, a revolutionary first-order linear solver that is reshaping the landscape of linear optimisation;
- CP-SAT, an award-winning constraint-programming solver;
- Glop, an accurate linear solver;
- Routing, a vehicle routing solver underpinning Google Maps Platform Route Optimization.
OR-Tools has long had its features accessible from other languages: the core algorithms are implemented in C++ for performance, but users can tap into them in Python, Java, C#, or Go.
It is recently available in Julia too, with a current focus on the linear and constraint solvers, either locally or remotely.
We provide a wrapper for our solvers that brings them to JuMP.jl through MathOptInterface.jl.
This tutorial will walk you through the features of OR-Tools and its solvers, then show examples of using OR-Tools from within Julia, either through JuMP or a lower-level interface.
We will also share our experience of C++-Julia interop.
View details
Preview abstract
Visual in-context learning (VICL), as a new paradigm in computer vision, allows the model to rapidly adapt to various tasks with only a handful of prompts and examples. While effective, the existing VICL paradigm exhibits poor generalizability under distribution shifts. In this work, we propose test-time visual in-context tuning (VICT), a method that can learn adaptive VICL models on the fly with a single test sample. Specifically, We flip the role between task prompts and the test sample and use a cycle consistency loss to reconstruct the original task prompt output. Our key insight is that a model should be aware of a new test distribution if it can successfully recover the original task prompts. Extensive experiments on seven representative vision tasks with 15 corruptions demonstrate that our VICT can improve the generalizability of VICL to unseen new domains
View details
InstructPipe: Generating Visual Blocks Pipelines with Human Instructions and LLMs
Jing Jin
Xiuxiu Yuan
Jun Jiang
Jingtao Zhou
Yiyi Huang
Zheng Xu
Kristen Wright
Jason Mayes
Mark Sherwood
Johnny Lee
Alex Olwal
Ram Iyengar
Na Li
Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI), ACM, pp. 23
Preview abstract
Visual programming has the potential of providing novice programmers with a low-code experience to build customized processing pipelines. Existing systems typically require users to build pipelines from scratch, implying that novice users are expected to set up and link appropriate nodes from a blank workspace. In this paper, we introduce InstructPipe, an AI assistant for prototyping machine learning (ML) pipelines with text instructions. We contribute two large language model (LLM) modules and a code interpreter as part of our framework. The LLM modules generate pseudocode for a target pipeline, and the interpreter renders the pipeline in the node-graph editor for further human-AI collaboration. Both technical and user evaluation (N=16) shows that InstructPipe empowers users to streamline their ML pipeline workflow, reduce their learning curve, and leverage open-ended commands to spark innovative ideas.
View details
Preview abstract
Measuring productivity is equivalent to building a model. All models are wrong, but some are useful. Productivity models are often “worryingly selective” (wrong because of omissions). Worrying selectivity can be combated by taking a holistic approach that includes multiple measurements of multiple outcomes. Productivity models should include multiple outcomes, metrics, and methods.
View details
Preview abstract
Recent work suggested utilizing inference compute, showing that scaling of number of samples consistently improves the fractions of problems solved by any attempt, namely the coverage. In this work, we suggest that inference scaling gains should be compared with proper baselines, as some datasets become degenerate when allowing a large number of attempts. We focus on two domains - mathematical reasoning and factual knowledge, showing that for the MATH and Entity Questions datasets, informed answer enumeration obtains similar or even better results than repeated model sampling, with a much lower sample budget. While we believe that inference scaling is a promising approach for unlocking the potential of language models, we recommend carefully selecting models and datasets when applying this method. Otherwise, the results of inference scaling should be interpreted with caution.
View details
Preview abstract
Modern deep learning algorithms use variations of gradient descent as their main learning methods. Gradient descent can be understood as the simplest Ordinary Differential Equation (ODE) solver; namely, the Euler method applied to the gradient flow differential equation. Since Euler, many ODE solvers have been devised that follow the gradient flow equation more precisely and more stably. Runge-Kutta (RK) methods provide a family of very powerful explicit and implicit high-order ODE solvers. However, these higher-order solvers have not found wide application in deep learning so far. In this work, we evaluate the performance of higher-order RK solvers when applied in deep learning, study their limitations, and propose ways to overcome these drawbacks. In particular, we explore how to improve their performance by naturally incorporating key ingredients of modern neural network optimizers such as preconditioning, adaptive learning rates, and momentum.
View details
Concerns Beyond the Accuracy of AI Output
DORA, Google (2025)
Preview abstract
Generative AI's potential for hallucinations and inaccuracies are by far the most discussed limitation in AI-assisted software development. But, whether developers have other concerns about using generative AI in their coding practice has not been thoroughly explored. This article describes the results of in-depth interviews with developers about their other concerns about generative AI in coding, beyond the tools accuracy, and discusses related policy implications for organizations developing software.
View details
Preview abstract
Users of routing services like Apple Maps, Google Maps, and Waze frequently wonder why a given route is proposed. This question particularly arises when dynamic conditions like traffic and road closures cause unusual routes to be proposed. While many such dynamic conditions may exist in a road network at any time, only a small fraction of those conditions are typically relevant to a given user's route. In this work, we give a simple algorithm that identifies a small set of traffic-laden road segments that answer the following question: Which traffic conditions cause a particular shortest traffic-aware route to differ from the shortest traffic-free route? We theoretically and experimentally show that our algorithm generates small and interpretable answers to this question.
View details
SMaCk: Efficient Instruction Cache Attacks via Self-Modifying Code Conflicts
Seonghun Son
Berk Gulmezoglu
ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2025)
Preview abstract
Self-modifying code (SMC) allows programs to alter their own instructions, optimizing performance and functionality on x86 processors. Despite its benefits, SMC introduces unique microarchitectural behaviors that can be exploited for malicious purposes. In this paper, we explore the security implications of SMC by examining how specific x86 instructions affecting instruction cache lines lead to measurable timing discrepancies between cache hits and misses. These discrepancies facilitate refined cache attacks, making them less noisy and more effective. We introduce novel attack techniques that leverage these timing variations to enhance existing methods such as Prime+Probe and Flush+Reload. Our advanced techniques allow adversaries to more precisely attack cryptographic keys and create covert channels akin
to Spectre across various x86 platforms. Finally, we propose a dynamic detection methodology utilizing hardware performance counters to mitigate these enhanced threats.
View details
I know what I don't know: improving model cascades through confidence tuning
Stephan Rabanser
Nathalie Rauschmayr
Petra Poklukar
Congchao Wang
2025
Preview abstract
Large-scale machine learning models deliver strong performance across a wide range of tasks but come with significant computational and resource constraints. To mitigate these challenges, local smaller models are often deployed alongside larger models, relying on routing and deferral mechanisms to offload complex tasks. However, existing approaches inadequately balance the capabilities of these models, often resulting in unnecessary deferrals or sub-optimal resource usage. In this work we introduce a novel loss function called Gatekeeper for calibrating smaller models in cascade setups. Our approach fine-tunes the smaller model to confidently handle tasks it can perform correctly while deferring complex tasks to the larger model. Moreover, it incorporates a mechanism for managing the trade-off between model performance and deferral accuracy, and is broadly applicable across various tasks and domains without any architectural changes. We evaluated our method on encoder-only, decoder-only, and encoder-decoder architectures. Experiments across image classification, language modeling, and vision-language tasks show that our approach substantially improves deferral performance.
View details
AI as a Catalyst for Educational Equity: Addressing Global Teacher Shortages and Learning Disparities
International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCERT) (2025)
Preview abstract
The global education system is grappling with a critical shortage of teachers, threatening the achievement of universal quality education. This article examines how artificial intelligence (AI) technologies can revolutionize educational access and equity by addressing these systemic challenges. Through a comprehensive article analysis of AI-enabled solutions, including personalized learning mechanisms, virtual tutoring systems, and intelligent content distribution platforms, the article explores the transformative potential of these technologies in democratizing education. The article investigates the implementation of AI across established educational platforms, examining their effectiveness in providing adaptive learning experiences, breaking down language barriers, and ensuring cultural relevance. The article demonstrates that strategic AI integration can significantly impact learning outcomes while helping to bridge the global teacher shortage gap. The article also addresses critical implementation challenges, providing policy recommendations and resource allocation frameworks for successful AI adoption in education systems worldwide. This article analysis contributes to the growing body of knowledge on educational technology by offering practical insights into how AI can be leveraged to create more inclusive, effective, and accessible learning environments, ultimately advancing the goal of quality education for all.
View details
Data Quality Issues in Multilingual Speech Datasets: The Need for Sociolinguistic Awareness and Proactive Language Planning
Preview
Mingfei Lau
Allen Chen
Yeming Fang
Tingting Xu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (ACL), Vienna, Austria (2025), 7466–7492
Preview abstract
Augmenting LLMs with context leads to improved performance across many applications. Despite much research on Retrieval Augmented Generation (RAG) systems, an open question is whether errors arise because LLMs fail to utilize the context from retrieval or the context itself is insufficient to answer the query. To shed light on this, we develop a new notion of sufficient context, along with a way to classify instances that have enough information to answer the query. We then use sufficient context to analyze several models and datasets. By stratifying errors based on context sufficiency, we find that proprietary LLMs (Gemini, GPT, Claude) excel at answering queries when the context is sufficient, but often output incorrect answers instead of abstaining when the context is not. On the other hand, open-source LLMs (Llama, Mistral, Gemma) hallucinate or abstain often, even with sufficient context. We further categorize cases when the context is useful, and improves accuracy, even though it does not fully answer the query and the model errs without the context. Building on our findings, we explore ways to reduce hallucinations in RAG systems, including a new selective generation method that leverages sufficient context information for guided abstention. Our method improves the fraction of correct answers among times where the model responds by 2--10% for Gemini, GPT, and Gemma.
View details