Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10456 publications
    Preview abstract Storage on Android has evolved significantly over the years, with each new Android version introducing changes aimed at enhancing usability, security, and privacy. While these updates typically help with restricting app access to storage through various mechanisms, they may occasionally introduce new complexities and vulnerabilities. A prime example is the introduction of scoped storage in Android 10, which fundamentally changed how apps interact with files. While intended to enhance user privacy by limiting broad access to shared storage, scoped storage has also presented developers with new challenges and potential vulnerabilities to address. However, despite its significance for user privacy and app functionality, no systematic studies have been performed to study Android’s scoped storage at depth from a security perspective. In this paper, we present the first systematic security analysis of the scoped storage mechanism. To this end, we design and implement a testing tool, named ScopeVerif, that relies on differential analysis to uncover security issues and implementation inconsistencies in Android’s storage. Specifically, ScopeVerif takes a list of security properties and checks if there are any file operations that violate any security properties defined in the official Android documentation. Additionally, we conduct a comprehensive analysis across different Android versions as well as a cross-OEM analysis to identify discrepancies in different implementations and their security implications. Our study identifies both known and unknown issues of scoped storage. Our cross-version analysis highlights undocumented changes as well as partially fixed security loopholes across versions. Additionally, we discovered several vulnerabilities in scoped storage implementations by different OEMs. These vulnerabilities stem from deviations from the documented and correct behavior, which potentially poses security risks. The affected OEMs and Google have acknowledged our findings and offered us bug bounties in response. View details
    Preview abstract Modern deep learning algorithms use variations of gradient descent as their main learning methods. Gradient descent can be understood as the simplest Ordinary Differential Equation (ODE) solver; namely, the Euler method applied to the gradient flow differential equation. Since Euler, many ODE solvers have been devised that follow the gradient flow equation more precisely and more stably. Runge-Kutta (RK) methods provide a family of very powerful explicit and implicit high-order ODE solvers. However, these higher-order solvers have not found wide application in deep learning so far. In this work, we evaluate the performance of higher-order RK solvers when applied in deep learning, study their limitations, and propose ways to overcome these drawbacks. In particular, we explore how to improve their performance by naturally incorporating key ingredients of modern neural network optimizers such as preconditioning, adaptive learning rates, and momentum. View details
    Preview abstract Large Language Models (LLMs) have demonstrated impressive capabilities across a range of natural language processing tasks. In particular, improvements in reasoning abilities and the expansion of context windows have opened new avenues for leveraging these powerful models. NL2SQL is challenging in that the natural language question is inherently ambiguous, while the SQL generation requires a precise understanding of complex data schema and semantics. One approach to this semantic ambiguous problem is to provide more and sufficient contextual information. In this work, we explore the performance and the latency trade-offs of the extended context window (a.k.a., long context) offered by Google's state-of-the-art LLM (\textit{gemini-1.5-pro}). We study the impact of various contextual information, including column example values, question and SQL query pairs, user-provided hints, SQL documentation, and schema. To the best of our knowledge, this is the first work to study how the extended context window and extra contextual information can help NL2SQL generation with respect to both accuracy and latency cost. We show that long context LLMs are robust and do not get lost in the extended contextual information. Additionally, our long-context NL2SQL pipeline based on Google's \textit{gemini-pro-1.5} achieve a strong performance with 67.41\% on BIRD benchmark (dev) without finetuning and expensive self-consistency based techniques. View details
    Preview abstract Initially conceived as a way to explain memory sharing in romantic couples, the concept of transactive memory systems (TMS) has been adopted by organizational psychology, information management, and other fields of study to examine team performance in corporate settings. While findings highlight a clear advantage for humans teams with TMS, it's not evident if AI-human teams could also develop such a psychological dynamic. This paper considers AI-human interaction through the lens of TMS and identifies potential opportunities for improvement in this area. View details
    Preview abstract Background: Providers spend a large percentage of their day using electronic health record (EHR) technology and frequently report frustration when EHR tasks are time-consuming and effortful. To solve these challenges, artificial intelligence (AI)–based enhancements to EHR technology are increasingly being deployed. However, AI-based implementations for EHR features often lack user-centered evaluation. Objective: This study evaluates, using a user-centered approach, the implementation of an AI-powered search and clinical discovery tool within an EHR system. Methods: We conducted a mixed methods study consisting of interviews, observations, and surveys for 5 months. Results: High adoption rates for the AI-based features (163/176, 93% users after 3 months) and significant increases across key metrics, including user satisfaction (U=49; P<.001) and perception of time saved (U=49; P<.001), demonstrated that the AI-based features were not only successfully integrated into various clinical workflows but also improved the user experience for clinicians. Conclusions: Our results underscore the feasibility and effectiveness of using a user-centered approach for the deployment of clinical AI tools. High adoption rates and positive user experiences were driven by our user-centered research program, which emphasized close collaboration with users, rapid incorporation of feedback, and tailored user training. This study program can be used as a starting framework for the design and integration of human-centered research methods for AI tool deployment in clinical settings. View details
    HueManity: Probing Fine-Grained Visual Perception in MLLMs
    Rynaa Grover
    Jayant Tamarapalli
    Sahiti Yerramilli
    Nilay Pande
    (2025)
    Preview abstract Multimodal Large Language Models (MLLMs) excel at high-level visual reasoning, but their performance on nuanced perceptual tasks remains surprisingly limited. We present HueManity, a benchmark designed to assess visual perception in MLLMs. The dataset comprises 83,850 images featuring two-character alphanumeric strings embedded in Ishihara test style dot patterns, challenging models on precise pattern recognition. Our evaluation of nine state-of-the-art MLLMs on HueManity demonstrates a significant performance deficit compared to human and traditional computer vision baselines. The best-performing MLLM achieved a 33.6% accuracy on the numeric "easy" task and a striking 3% on the alphanumeric "hard" task. In contrast, human participants achieved near-perfect scores (100% and 95.6%), and a fine-tuned ResNet50 model reached accuracies of 96.5% and 94.5%. These results highlight a critical gap in the visual capabilities of current MLLMs. Our analysis further explores potential architectural and training-paradigm factors contributing to this perceptual gap in MLLMs. We will open-source HueManity dataset and code to foster further research in improving perceptual robustness of MLLMs. View details
    Preview abstract Transformers have become the predominant architecture in foundation models due to their excellent performance across various domains. However, the substantial cost of scaling these models remains a significant concern. This problem arises primarily from their dependence on fixed parameters within linear projections, especially when architectural modifications (e.g., channel dimensions) are introduced. Each scaling iteration typically requires retraining the entire model from the beginning, leading to suboptimal utilization of computational resources. To overcome this limitation, we introduce TokenFormer, a naturally scalable architecture that leverages the attention mechanism exclusively for computations among input tokens and interactions between input tokens and model parameters, thereby enhancing architectural flexibility. By treating model parameters as tokens, we replace all the linear projections in Transformer with our token-parameter attention layer, where input tokens act as queries and model parameters as keys and values. This innovative approach allows for progressive and efficient scaling without necessitating retraining from scratch. Our model scales from 124 million to 1.4 billion parameters by incrementally adding new key-value parameters, achieving performance comparable to models trained from scratch while greatly reducing training costs. Code and models will be publicly available. View details
    SSDTrain: Faster Large Language Model Training Using SSD-Based Activation Offloading
    Kun Wu
    Jeongmin Brian Park
    Mert Hidayetoğlu
    Vikram Sharma Mailthody
    Sitao Huang
    Steven Lumetta
    Wen-mei Hwu
    Design Automation Conference (DAC) (2025)
    Preview abstract The scaling up of Large Language Models (LLMs) demands more memory than current GPUs can provide, hindering the training process. To address this challenge, we propose SSDTrain to efficiently offload activations, the intermediate tensors produced during LLM training, to SSDs. This approach reduces GPU memory usage without impacting performance by adaptively overlapping data transfers with computation. SSDTrain is compatible with popular deep learning frameworks like PyTorch, Megatron, and DeepSpeed, and it employs techniques such as tensor deduplication, forwarding, and adaptive offloading to further enhance efficiency. We conduct extensive experiments on Llama, BERT, and T5. Results demonstrate that SSDTrain effectively reduces 45% of the activation peak memory usage. It can perfectly overlap the IO with the computation without introducing performance penalty. SSDTrain can achieve a performance boost of up to 31% compared to the conventional training strategy using the same GPU systems. View details
    Preview abstract Despite the advent of legislation such as the General Data Protection Regulation (GDPR) with its associated "Right to be Forgotten" (RTBF), few, if any, studies have measured user reactions to realistic edge cases with public-interest content. Surveying both users covered by and excluded from RTBF, this vignette-based survey experiment sought to better understand how users think of delisting content from search engine results and what factors influence user perceptions. While leaving information accessible in search engine results generally leads to warmer feelings towards those search engines than delisting it, we find that users do prefer different outcomes depending on contextual elements specific to given cases. We also find that whether a country has active RTBF legislation does seem to be associated with both knowledge and attitudes about RTBF, but is unlikely to explain all of it. These results indicate a complex context around removing public-interest content from search engines’ results; it is essential that experts sensitive to local context perform the review in order to ensure that removal requests are handled in a way that meets users’ expectations. View details
    A Call to Action: Advancing the Conversation Around Neurodivergent Education-Employment Transitions
    Dannie Lynn Fountain
    Vicki Baker
    Kevin Danley
    Closing the Gap (2025)
    Preview abstract Neurodiversity is still largely stigmatized and excluded from DEIB frameworks and related organizational initiatives, despite the increased recognition regarding the benefits of neuroinclusion within the education and corporate spheres. We seek to address this knowledge-to-practice gap through the creation of the Neurodiversity Engagement Framework. By highlighting supports needed for neurodivergent individuals, and those that support them, the framework helps neurodivergent individuals navigate within and across higher education and industry contexts. Informed by an interdisciplinary review of literature from higher education, industry, and corporate leadership contexts, the Neurodiversity Engagement Framework brings to light prevailing challenges within practices and policies, serving as a guide for the creation of a more supportive foundation for neurodiverse individuals to thrive. In this manuscript, readers are encouraged to consider the myriad of impacts that neurodiversity has on higher education and industry experiences and the ways that organizations can be more proactive in their support of this growing population. To conclude, we offer a roadmap for future research and practice to further elucidate ways academic and corporation leaders and policymakers can effectively support neurodivergent individuals. View details
    Linear Elastic Caching via Ski Rental
    Todd Lipcon
    The biennial Conference on Innovative Data Systems Research (2025)
    Preview abstract In this work we study the Linear Elastic Caching problem, where the goal is to minimize the total cost of a cache inclusive of not just its misses, but also its memory footprint integrated over time. We demonstrate a theoretical connection to the classic ski rental problem and propose a practical algorithm that combines online caching algorithms with ski rental policies. We also introduce a lightweight machine learning-based algorithm for ski rental that is optimized for production workloads and is easy to integrate within existing database systems. Evaluations on both production workloads in Google Spanner and publicly available traces show that the proposed elastic caching approach can significantly reduce the total cache cost compared to traditional fixed-size cache policies. View details
    ZAPBench: A Benchmark for Whole-Brain Activity Prediction in Zebrafish
    Alexander Immer
    Alex Bo-Yuan Chen
    Mariela D. Petkova
    Nirmala A. Iyer
    Luuk Willem Hesselink
    Aparna Dev
    Gudrun Ihrke
    Woohyun Park
    Alyson Petruncio
    Aubrey Weigel
    Wyatt Korff
    Florian Engert
    Jeff W. Lichtman
    Misha B. Ahrens
    International Conference on Learning Representations (ICLR) (2025)
    Preview abstract Data-driven benchmarks have led to significant progress in key scientific modeling domains including weather and structural biology. Here, we present the Zebrafish Activity Prediction Benchmark (ZAPBench), which quantitatively measures progress on the problem of predicting cellular-resolution neural activity throughout an entire vertebrate brain. The benchmark is based on a novel dataset containing 4d light-sheet microscopy recordings of more than 70,000 neurons in a larval zebrafish brain, along with motion stabilized and voxel-level cell segmentations of these data that facilitate development of a variety of forecasting methods. Initial results from a selection of time series and volumetric video modeling approaches achieve better performance than naive baseline methods, but also show room for further improvement. The specific brain used in the activity recording is also undergoing synaptic-level anatomical mapping, which will enable future integration of detailed structural information into ZAP forecasting methods. View details
    DORA Impact of Generative AI in Software Development
    Derek DeBellis
    Daniella Villalba
    Nathen Harvey
    DORA, Google (2025)
    Preview abstract Generative AI is transforming how software is built, offering unprecedented opportunities and raising new challenges. Based on extensive research and developer interviews, this DORA report provides a nuanced understanding of AI's impact on individuals, teams, and organizations. View details
    Preview abstract We consider the Coalition Structure Learning (CSL) problem in multi-agent systems, motivated by the existence of coalitions in many real-world systems, e.g., trading platforms and auction systems. In this problem, there is a hidden coalition structure within a set of $n$ agents, which affects the behavior of the agents in games. Our goal is to actively design a sequence of games for the agents to play, such that observations in these games can be used to learn the hidden coalition structure. In particular, we consider the setting where in each round, we design and present a game together with a strategy profile to the agents, and receive a multiple-bit observation -- for each agent, we observe whether or not they would like to deviate from the specified strategy in this given game. Our contributions are three-fold: First, we show that we can learn the coalition structure in $O(\log n)$ rounds if we are allowed to choose any normal-form game in each round, matching the information-theoretical lower bound, and the result can be extended to congestion games. Second, in a more restricted setting where we can only choose a graphical game with degree limit $d$, we develop an algorithm to learn the coalition structure in $O(n/d+\log d)$ rounds. Third, when we can only learn the coalition structure through running second-price auctions with personalized reserve prices, we show that the coalition structure can be learned in $O(c\log n)$ rounds, where $c$ is the size of the largest coalition. View details
    Preview abstract Cardinality sketches are compact data structures that efficiently estimate the number of distinct elements across multiple queries while minimizing storage, communication, and computational costs. However, recent research has shown that these sketches can fail under adaptively chosen queries, breaking down after approximately $\tilde{O}(k^2)$ queries, where $k$ is the sketch size. In this work, we overcome this quadratic barrier by designing robust estimators with fine-grained guarantees. Specifically, our constructions can handle an exponential number of adaptive queries, provided that each element participates in at most $\tilde{O}(k^2)$ queries. This effectively shifts the quadratic barrier from the total number of queries to the number of queries sharing the same element, which can be significantly smaller. Beyond cardinality sketches, our approach expands the toolkit for robust algorithm design. View details