Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10270 publications
    Improving simulation-based origin-destination demand calibration using sample segment counts data
    Yechen Li
    Arwa Alanqary
    The 12th Triennial Symposium on Transportation Analysis conference (TRISTAN XII), Okinawa, Japan (2025) (to appear)
    Preview abstract This paper introduces a novel approach to demand estimation that utilizes partial observations of segment-level track counts. Building on established simulation-based demand estimation methods, we present a modified formulation that integrates sample track counts as a regularization term. This approach effectively addresses the underdetermination challenge in demand estimation, moving beyond the conventional reliance on a prior OD matrix. The proposed formulation aims to preserve the distribution of the observed track counts while optimizing the demand to align with observed path-level travel times. We tested this approach on Seattle's highway network with various congestion levels. Our findings reveal significant enhancements in the solution quality, particularly in accurately recovering ground truth demand patterns at both the OD and segment levels. View details
    SMaCk: Efficient Instruction Cache Attacks via Self-Modifying Code Conflicts
    Seonghun Son
    Berk Gulmezoglu
    ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2025) (to appear)
    Preview abstract Self-modifying code (SMC) allows programs to alter their own instructions, optimizing performance and functionality on x86 processors. Despite its benefits, SMC introduces unique microarchitectural behaviors that can be exploited for malicious purposes. In this paper, we explore the security implications of SMC by examining how specific x86 instructions affecting instruction cache lines lead to measurable timing discrepancies between cache hits and misses. These discrepancies facilitate refined cache attacks, making them less noisy and more effective. We introduce novel attack techniques that leverage these timing variations to enhance existing methods such as Prime+Probe and Flush+Reload. Our advanced techniques allow adversaries to more precisely attack cryptographic keys and create covert channels akin to Spectre across various x86 platforms. Finally, we propose a dynamic detection methodology utilizing hardware performance counters to mitigate these enhanced threats. View details
    Preview abstract Augmenting LLMs with context leads to improved performance across many applications. Despite much research on Retrieval Augmented Generation (RAG) systems, an open question is whether errors arise because LLMs fail to utilize the context from retrieval or the context itself is insufficient to answer the query. To shed light on this, we develop a new notion of sufficient context, along with a way to classify instances that have enough information to answer the query. We then use sufficient context to analyze several models and datasets. By stratifying errors based on context sufficiency, we find that proprietary LLMs (Gemini, GPT, Claude) excel at answering queries when the context is sufficient, but often output incorrect answers instead of abstaining when the context is not. On the other hand, open-source LLMs (Llama, Mistral, Gemma) hallucinate or abstain often, even with sufficient context. We further categorize cases when the context is useful, and improves accuracy, even though it does not fully answer the query and the model errs without the context. Building on our findings, we explore ways to reduce hallucinations in RAG systems, including a new selective generation method that leverages sufficient context information for guided abstention. Our method improves the fraction of correct answers among times where the model responds by 2--10% for Gemini, GPT, and Gemma. View details
    Context is Key for Agent Security
    Eugene Bagdasaryan
    Lillian Tsai
    arXiv (2025)
    Preview abstract Judging the safety of an action, whether taken by a human or a system, must take into account the context in which the action takes place. For example, deleting an email from a user's mailbox may or may not be appropriate depending on the email's content, the user's goals, or even available space. Systems today that make these judgements---providing security against harmful or inappropriate actions---rely on manually-crafted policies or user confirmation for each relevant context. With the upcoming deployment of systems like generalist agents, we argue that we must rethink security designs to adapt to the scale of contexts and capabilities of these systems. As a first step, this paper explores contextual security in the domain of agents and proposes contextual security for agents (Conseca), a framework to generate just-in-time, contextual, and human-verifiable security policies. View details
    Scaling Laws for Downstream Task Performance in Machine Translation
    Hussein Hazimeh
    Natalia Ponomareva
    Sanmi Koyejo
    International Conference on Learning Representations (ICLR) (2025) (to appear)
    Preview abstract Scaling laws provide important insights that can guide the design of large language models (LLMs). Existing work has primarily focused on studying scaling laws for pretraining (upstream) loss. However, in transfer learning settings, in which LLMs are pretrained on an unsupervised dataset and then finetuned on a downstream task, we often also care about the downstream performance. In this work, we study the scaling behavior in a transfer learning setting, where LLMs are finetuned for machine translation tasks. Specifically, we investigate how the choice of the \emph{pretraining} data and its size affect downstream performance (translation quality) as judged by: downstream cross-entropy and translation quality metrics such as BLEU and COMET scores. Our experiments indicate that the size of the finetuning dataset and the distribution alignment between the pretraining and downstream data significantly influence the scaling behavior. With sufficient alignment, both downstream cross-entropy and translation quality scores improve monotonically with more pretraining data. In such cases, we show that it is possible to predict the downstream translation quality metrics with good accuracy using a log-law. However, there are cases where moderate misalignment causes the downstream translation scores to fluctuate or get worse with more pretraining, whereas downstream cross-entropy monotonically improves. By analyzing these, we provide new practical insights for choosing appropriate pretraining data. View details
    Linear Elastic Caching via Ski Rental
    Todd Lipcon
    The biennial Conference on Innovative Data Systems Research (2025)
    Preview abstract In this work we study the Linear Elastic Caching problem, where the goal is to minimize the total cost of a cache inclusive of not just its misses, but also its memory footprint integrated over time. We demonstrate a theoretical connection to the classic ski rental problem and propose a practical algorithm that combines online caching algorithms with ski rental policies. We also introduce a lightweight machine learning-based algorithm for ski rental that is optimized for production workloads and is easy to integrate within existing database systems. Evaluations on both production workloads in Google Spanner and publicly available traces show that the proposed elastic caching approach can significantly reduce the total cache cost compared to traditional fixed-size cache policies. View details
    Preview abstract Recent advances in long-context large language models (LLMs) have led to the emerging paradigm of many-shot in-context learning (ICL), where it is observed that scaling many more demonstrating examples beyond the conventional few-shot setup in the context can lead to performance benefits. However, despite its promise, it is unclear what aspects dominate the benefits and whether simply scaling to more examples is the most effective way of improving many-shot ICL. In this work, we first provide an analysis of the factors driving many-shot ICL, and we find that 1) many-shot performance can still be attributed to often a few disproportionately influential examples and 2) identifying such influential examples ("optimize") and using them as demonstrations to regenerate new examples ("generate") can lead to further improvements. Inspired by the findings, we propose BRIDGE, an algorithm that alternates between the optimize step with Bayesian optimization to discover the influential sets of examples and the generate step to reuse this set to expand the reasoning paths of the examples back to the many-shot regime automatically. On Gemini, Claude, and Mistral LLMs of different sizes, we show that BRIDGE to significant improvements across a diverse set of tasks, including symbolic reasoning, numerical reasoning, and code generation. View details
    Automatic Synthesis of Specialized Hash Function
    Leonardo G Fae
    Fernando Magno Quintao Pereira
    Renato B Hoffmann
    Dalvan Grieber
    2025
    Preview abstract https://www.overleaf.com/project/65ba7d45dae2bce751dba252 Hashing is a fundamental operation in various computer sci- ence applications. Despite the prevalence of specific key formats like social security numbers, MAC addresses, plate numbers, and URLs, hashing libraries typically treat them as general byte sequences. This paper introduces a technique for synthesizing specialized hash functions tailored to par- ticular byte formats. The proposed code generation method leverages three prevalent patterns: (i) fixed-length keys, (ii) keys with common subsequences, and (iii) keys ranging on predetermined sequences of bytes. The code generation pro- cess involves two algorithms: one identifies relevant regular expressions within key examples, and the other generates specialized hash functions based on these expressions. This approach, straightforward to implement, showcases improve- ments over highly optimized hash function implementations. Comparative analysis demonstrates that our synthetic func- tions outperform counterparts in the C++ Standard Template Library and the Google Abseil Library, achieving speedups ranging from 2% to 11%, depending on the key format. View details
    SSDTrain: Faster Large Language Model Training Using SSD-Based Activation Offloading
    Mert Hidayetoğlu
    Steven Lumetta
    Kun Wu
    Sitao Huang
    Jeongmin Brian Park
    Wen-mei Hwu
    Vikram Sharma Mailthody
    Design Automation Conference (DAC) (2025)
    Preview abstract The scaling up of Large Language Models (LLMs) demands more memory than current GPUs can provide, hindering the training process. To address this challenge, we propose SSDTrain to efficiently offload activations, the intermediate tensors produced during LLM training, to SSDs. This approach reduces GPU memory usage without impacting performance by adaptively overlapping data transfers with computation. SSDTrain is compatible with popular deep learning frameworks like PyTorch, Megatron, and DeepSpeed, and it employs techniques such as tensor deduplication, forwarding, and adaptive offloading to further enhance efficiency. We conduct extensive experiments on Llama, BERT, and T5. Results demonstrate that SSDTrain effectively reduces 45% of the activation peak memory usage. It can perfectly overlap the IO with the computation without introducing performance penalty. SSDTrain can achieve a performance boost of up to 31% compared to the conventional training strategy using the same GPU systems. View details
    Preview abstract In today's rapidly evolving business landscape, Governance, Risk, and Compliance (GRC) leaders in large, complex organizations face unprecedented challenges. The cloud has revolutionized how businesses operate, offering unprecedented scalability, flexibility, cost-efficiency, additional security and resilience. However, this transformation also presents new challenges for GRC professionals. In a cloud-native world, where applications are built and deployed in dynamic, distributed environments, traditional GRC on-prem approaches, manual processes and spreadsheets struggle to keep pace. The key to success lies in embracing a data-driven GRC strategy that leverages the power of the cloud to enhance agility, visibility, and resilience. View details
    InstructPipe: Building Visual Programming Pipelines in Visual Blocks with Human Instructions Using LLMs
    Alex Olwal
    Mark Sherwood
    Jing Jin
    Na Li
    Jingtao Zhou
    Jun Jiang
    Ram Iyengar
    Zhongyi Zhou
    Yiyi Huang
    Kristen Wright
    Xiuxiu Yuan
    Jason Mayes
    Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI), ACM, pp. 23
    Preview abstract Visual programming provides beginner-level programmers with a coding-free experience to build their customized pipelines. Existing systems require users to build a pipeline entirely from scratch, implying that novice users need to set up and link appropriate nodes all by themselves, starting from a blank workspace. We present InstructPipe, an AI assistant that enables users to start prototyping machine learning (ML) pipelines with text instructions. We designed two LLM modules and a code interpreter to execute our solution. LLM modules generate pseudocode of a target pipeline, and the interpreter renders a pipeline in the node-graph editor for further human-AI collaboration. Technical evaluations reveal that InstructPipe reduces user interactions by 81.1% compared to traditional methods. Our user study (N=16) showed that InstructPipe empowers novice users to streamline their workflow in creating desired ML pipelines, reduce their learning curve, and spark innovative ideas with open-ended commands. View details
    AI as a Catalyst for Educational Equity: Addressing Global Teacher Shortages and Learning Disparities
    International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCERT) (2025)
    Preview abstract The global education system is grappling with a critical shortage of teachers, threatening the achievement of universal quality education. This article examines how artificial intelligence (AI) technologies can revolutionize educational access and equity by addressing these systemic challenges. Through a comprehensive article analysis of AI-enabled solutions, including personalized learning mechanisms, virtual tutoring systems, and intelligent content distribution platforms, the article explores the transformative potential of these technologies in democratizing education. The article investigates the implementation of AI across established educational platforms, examining their effectiveness in providing adaptive learning experiences, breaking down language barriers, and ensuring cultural relevance. The article demonstrates that strategic AI integration can significantly impact learning outcomes while helping to bridge the global teacher shortage gap. The article also addresses critical implementation challenges, providing policy recommendations and resource allocation frameworks for successful AI adoption in education systems worldwide. This article analysis contributes to the growing body of knowledge on educational technology by offering practical insights into how AI can be leveraged to create more inclusive, effective, and accessible learning environments, ultimately advancing the goal of quality education for all. View details
    Triaging mammography with artificial intelligence: an implementation study
    Samantha Winter
    Atilla Kiraly
    Scott Mayer McKinney
    Jie Yang
    Krish Eswaran
    Shravya Shetty
    Timo Kohlberger
    Stacey Caron
    Fereshteh Mahvar
    David Melnick
    Sonya Bhole
    Arnav Agharwal
    David V. Schacht
    Dipti Gupta
    Basil Mustafa
    Alejandra Maciel
    Martha Sevenich
    Sarah M. Friedewald
    Mozziyar Etemadi
    Sunny Jansen
    Shiro Kadowaki
    Gavin Duggan
    Rubin Zhang
    Luca Speroni
    Breast Cancer Research and Treatment (2025)
    Preview abstract Purpose Many breast centers are unable to provide immediate results at the time of screening mammography which results in delayed patient care. Implementing artificial intelligence (AI) could identify patients who may have breast cancer and accelerate the time to diagnostic imaging and biopsy diagnosis. Methods In this prospective randomized, unblinded, controlled implementation study we enrolled 1000 screening participants between March 2021 and May 2022. The experimental group used an AI system to prioritize a subset of cases for same-visit radiologist evaluation, and same-visit diagnostic workup if necessary. The control group followed the standard of care. The primary operational endpoints were time to additional imaging (TA) and time to biopsy diagnosis (TB). Results The final cohort included 463 experimental and 392 control participants. The one-sided Mann-Whitney U test was employed for analysis of TA and TB. In the control group, the TA was 25.6 days [95% CI 22.0–29.9] and TB was 55.9 days [95% CI 45.5–69.6]. In comparison, the experimental group's mean TA was reduced by 25% (6.4 fewer days [one-sided 95% CI > 0.3], p<0.001) and mean TB was reduced by 30% (16.8 fewer days; 95% CI > 5.1], p=0.003). The time reduction was more pronounced for AI-prioritized participants in the experimental group. All participants eventually diagnosed with breast cancer were prioritized by the AI. Conclusions Implementing AI prioritization can accelerate care timelines for patients requiring additional workup, while maintaining the efficiency of delayed interpretation for most participants. Reducing diagnostic delays could contribute to improved patient adherence, decreased anxiety and addressing disparities in access to timely care. View details
    Preview abstract Despite the advent of legislation such as the General Data Protection Regulation (GDPR) with its associated "Right to be Forgotten" (RTBF), few, if any, studies have measured user reactions to realistic edge cases with public-interest content. Surveying both users covered by and excluded from RTBF, this vignette-based survey experiment sought to better understand how users think of delisting content from search engine results and what factors influence user perceptions. While leaving information accessible in search engine results generally leads to warmer feelings towards those search engines than delisting it, we find that users do prefer different outcomes depending on contextual elements specific to given cases. We also find that whether a country has active RTBF legislation does seem to be associated with both knowledge and attitudes about RTBF, but is unlikely to explain all of it. These results indicate a complex context around removing public-interest content from search engines’ results; it is essential that experts sensitive to local context perform the review in order to ensure that removal requests are handled in a way that meets users’ expectations. View details