Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10197 publications
Preview abstract
Augmenting LLMs with context leads to improved performance across many applications. Despite much research on Retrieval Augmented Generation (RAG) systems, an open question is whether errors arise because LLMs fail to utilize the context from retrieval or the context itself is insufficient to answer the query. To shed light on this, we develop a new notion of sufficient context, along with a way to classify instances that have enough information to answer the query. We then use sufficient context to analyze several models and datasets. By stratifying errors based on context sufficiency, we find that proprietary LLMs (Gemini, GPT, Claude) excel at answering queries when the context is sufficient, but often output incorrect answers instead of abstaining when the context is not. On the other hand, open-source LLMs (Llama, Mistral, Gemma) hallucinate or abstain often, even with sufficient context. We further categorize cases when the context is useful, and improves accuracy, even though it does not fully answer the query and the model errs without the context. Building on our findings, we explore ways to reduce hallucinations in RAG systems, including a new selective generation method that leverages sufficient context information for guided abstention. Our method improves the fraction of correct answers among times where the model responds by 2--10% for Gemini, GPT, and Gemma.
View details
Preview abstract
Storage on Android has evolved significantly over the years, with each new Android version introducing changes aimed at enhancing usability, security, and privacy. While these updates typically help with restricting app access to storage through various mechanisms, they may occasionally introduce new complexities and vulnerabilities. A prime example is the introduction of scoped storage in Android 10, which fundamentally changed how apps interact with files. While intended to enhance user privacy by limiting broad access to shared storage, scoped storage has also presented developers with new challenges and potential vulnerabilities to address. However, despite its significance for user privacy and app functionality, no systematic studies have been performed to study Android’s scoped storage at depth from a security perspective. In this paper, we present the first systematic security analysis of the scoped storage mechanism. To this end, we design and implement a testing tool, named ScopeVerif, that relies on differential analysis to uncover security issues and implementation inconsistencies in Android’s storage. Specifically, ScopeVerif takes a list of security properties and checks if there are any file operations that violate any security properties defined in the official Android documentation. Additionally, we conduct a comprehensive analysis across different Android versions as well as a cross-OEM analysis to identify discrepancies in different implementations and their security implications. Our study identifies both known and unknown issues of scoped storage. Our cross-version analysis highlights undocumented changes as well as partially fixed security loopholes across versions. Additionally, we discovered several vulnerabilities in scoped storage implementations by different OEMs. These vulnerabilities stem from deviations from the documented and correct behavior, which potentially poses security risks. The affected OEMs and Google have acknowledged our findings and offered us bug bounties in response.
View details
Preview abstract
Today’s smartphone interactions are typically designed with one primary preset, accompanied by customization settings that can be manually adjusted. To promote the creation of contextually aware experiences, researchers have highlighted the factors that influence mobile device usage in the ability-based design framework. This paper expands upon existing frameworks and contributes to an empirical understanding of smartphone accessibility. Through a 10-day longitudinal diary study and video interview with 24 individuals who do and do not identify as having a disability, the research also illustrates the reactions of reattempt, adaptation, and avoidance, which were used in response to a lack of smartphone accessibility. Despite experiencing scenarios where accessibility settings could be leveraged, 20 out of 24 participants did not use accessibility settings on their smartphone. A total of 12 out of 24 participants tried accessibility settings on their smartphones, however identifying accessibility was not for them. This work highlights the need to shift current design practices to better serve the accessibility community.
View details
Gemini & Physical World: Large Language Models Can Estimate the Intensity of Earthquake Shaking from Multi-Modal Social Media Posts
Marc Stogaitis
Tajinder Gadh
Richard Allen
Alexei Barski
Robert Bosch
Patrick Robertson
Youngmin Cho
Nivetha Thiruverahan
Aman Raj
Geophysical Journal International (2025), ggae436
Preview abstract
This paper presents a novel approach for estimating the ground shaking intensity using real-time social media data and CCTV footage. Employing the Gemini 1.5 Pro’s (Reid et al. 2024) model, a multi-modal language model, we demonstrate the ability to extract relevant information from unstructured data utilizing generative AI and natural language processing. The model’s output, in the form of Modified Mercalli Intensity (MMI) values, align well with independent observational data. Furthermore, our results suggest that beyond its advanced visual and auditory understanding abilities, Gemini appears to utilize additional sources of knowledge, including a simplified understanding of the general relationship between earthquake magnitude, distance, and MMI intensity, which it presumably acquired during its training, in its reasoning and decision-making processes. These findings raise intriguing questions about the extent of Gemini's general understanding of the physical world and its phenomena. Gemini’s ability to generate results consistent with established scientific knowledge highlights the potential of LLMs like Gemini in augmenting our understanding of complex physical phenomena such as earthquakes. More specifically, the results of this study highlight the potential of LLMs like Gemini to revolutionize citizen seismology by enabling rapid, effective, and flexible analysis of crowdsourced data from eyewitness accounts for assessing earthquake impact and providing crisis situational awareness. This approach holds a great promise for improving early warning systems, disaster response, and overall resilience in earthquake-prone regions. This study provides a significant step toward harnessing the power of social media and AI for earthquake disaster mitigation.
View details
Databases in the Era of Memory-Centric Computing
Anastasia Ailamaki
Lawrence Benson
Helena Caminal
Jana Gičeva
Eric Seldar
Lisa Wu Wills
Preview abstract
The increasing disparity between processor core counts and memory bandwidth, coupled with the rising cost and underutilization of memory, introduces a performance and cost Memory Wall and presents a significant challenge to the scalability of database systems. We argue that current processor-centric designs are unsustainable, and we advocate for a shift towards memory-centric computing, where disaggregated memory pools enable cost-effective scaling and robust performance. Database systems are uniquely positioned to leverage memory-centric systems because of their intrinsic data-centric nature. We demonstrate how memory-centric database operations can be realized with current hardware, paving the way for more efficient and scalable data management in the cloud.
View details
PreFix: Optimizing the Performance of Heap-Intensive Applications
Chaitanya Mamatha Ananda
Rajiv Gupta
Han Shen
CGO 2025: International Symposium on Code Generation and Optimization, Las Vegas, NV, USA (to appear)
Preview abstract
Analyses of heap-intensive applications show that a small fraction of heap objects account for the majority of heap accesses and data cache misses. Prior works like HDS and HALO have shown that allocating hot objects in separate memory regions can improve spatial locality leading to better application performance. However, these techniques are constrained in two primary ways, limiting their gains. First, these techniques have Imperfect Separation, polluting the hot memory region with several cold objects. Second, reordering of objects across allocations is not possible as the original object allocation order is preserved. This paper presents a novel technique that achieves near perfect separation of hot objects via a new context mechanism that efficiently identifies hot objects with high precision. This technique, named PreFix, is based upon Preallocating memory for a Fixed small number of hot objects. The program, guided by profiles, is instrumented to compute context information derived from
dynamic object identifiers, that precisely identifies hot object allocations that are then placed at predetermined locations in the preallocated memory. The preallocated memory region for hot objects provides the flexibility to reorder objects across allocations and allows colocation of objects that are part of a hot data stream (HDS), improving spatial locality. The runtime overhead of identifying hot objects is not significant as this optimization is only focused on a small number of static hot allocation sites and dynamic hot objects. While there is an increase in the program’s memory foot-print, it is manageable and can be controlled by limiting the size of the preallocated memory. In addition, PreFix incorporates an object recycling optimization that reuses the same preallocated space to store different objects whose lifetimes are not expected to overlap. Our experiments with 13 heap-intensive applications yields reductions in execution times ranging from 2.77% to 74%. On average PreFix reduces execution time by 21.7% compared to 7.3% by HDS and 14% by HALO. This is due to PreFix’s precision in hot object identification, hot object colocation, and low runtime overhead.
View details
Preview abstract
The problem of contract design addresses the challenge of moral hazard in principle-agent setups. The agent exerts costly efforts that produce a random outcome with an associated reward for the principal. Moral hazard refers to the tension that the principal cannot observe the agent’s effort level hence needs to incentivize the agent only through rewarding the realized effort outcome, i.e., the contract. Bayesian contract design studies the principal’s design problem of an optimal contract when facing an unknown agent characterized by a private Bayesian type. In its most general form, the agent’s type is inherently “multi-parameter” and can arbitrarily affect both the agent’s productivity and effort costs. In contrast, a natural single-parameter setting of much recent interest simplifies the agent’s type to a single value that describes the agent’s cost per unit of effort, whereas agents’ efforts are assumed to be equally
productive.
The main result of this paper is an almost approximation-preserving polynomial-time reduction from the most general multi-parameter Bayesian contract design (BCD) to single-parameter BCD. That is, for any multi-parameter BCD instance I^M, we construct a single-parameter instance I^S such that any β-approximate contract (resp. menu of contracts) of I^S can in turn be converted to a (β − ϵ)-approximate contract (resp. menu of contracts) of I^M. The reduction is in time polynomial in the input size and log(1/ϵ); moreover, when β = 1 (i.e., the given single-parameter solution is exactly optimal), the dependence on 1/ϵ can be removed, leading to a polynomial-time exact reduction. This efficient reduction is somewhat surprising because in the closely related problem of Bayesian mechanism design, a polynomial-time reduction from multi-parameter to single-parameter setting is believed to not exist. Our result demonstrates the intrinsic difficulty of addressing moral hazard in Bayesian contract design, regardless of being single-parameter or multi-parameter.
As byproducts, our reduction answers two open questions in recent literature of algorithmic contract design: (a) it implies that optimal contract design in single-parameter BCD is not in APX unless P=NP even when the agent’s type distribution is regular, answering the open question of [3] in the negative; (b) it implies that the principal’s (order-wise) tight utility gap between using a menu of contracts and a single contract is Θ(n) where n is the number of actions, answering the major open question of [27] for the single-parameter case.
View details
Preview abstract
We study the existence of almost fair and near-optimal solutions to a routing problem as defined in the seminal work of Rosenthal. We focus on the setting where multiple alternative routes are available for each potential request (which corresponds to a potential user of the network). This model captures a collection of diverse applications such as packet routing in communication networks, routing in road networks with multiple alternative routes, and the economics of transportation of goods.
Our recommended routes have provable guarantees in terms of both the total cost and fairness concepts such as approximate envy-freeness. We employ and appropriately combine tools from algorithmic game theory and fair division. Our results apply on two distinct models: the splittable case where the request is split among the selected paths (e.g., routing a fleet of trucks) and the unsplittable case where the request is assigned to one of its designated paths (e.g., a single user request). Finally, we conduct an empirical analysis to test the performance of our approach against simpler baselines using the real world road network of New York City.
View details
Taming Self-Training for Open-Vocabulary Object Detection
Shiyu Zhao
Samuel Schulter
Zhixing Zhang
Vijay Kumar B G
Yumin Suh
Manmohan Chandraker
Dimitris N. Metaxas
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
Preview abstract
Recent studies have shown promising performance in open-vocabulary object detection (OVD) by utilizing pseudo labels (PLs) from pretrained vision and language models (VLMs). However, teacher-student self-training, a powerful and widely used paradigm to leverage PLs, is rarely explored for OVD. This work identifies two challenges of using self-training in OVD: noisy PLs from VLMs and frequent distribution changes of PLs. To address these challenges, we propose SAS-Det that tames self-training for OVD from two key perspectives. First, we present a split-and-fusion (SAF) head that splits a standard detection into an open-branch and a closed-branch. This design can reduce noisy supervision from pseudo boxes. Moreover, the two branches learn complementary knowledge from different training data, significantly enhancing performance when fused together. Second, in our view, unlike in closed-set tasks, the PL distributions in OVD are solely determined by the teacher model. We introduce a periodic update strategy to decrease the number of updates to the teacher, thereby decreasing the frequency of changes in PL distributions, which stabilizes the training process. Extensive experiments demonstrate SAS-Det is both efficient and effective. SAS-Det outperforms recent models of the same scale by a clear margin and achieves 37.4 AP50 and 29.1 APr on novel categories of the COCO and LVIS benchmarks, respectively.
View details
Discovering Datasets on the Web Scale: Challenges and Recommendations for Google Dataset Search
Daniel Russell
Stella Dugall
Harvard Data Science Review (2024)
Preview abstract
With the rise of open data in the last two decades, more datasets are online and more people are using them for projects and research. But how do people find datasets? We present the first user study of Google Dataset Search, a dataset-discovery tool that uses a web crawl and open ecosystem to find datasets. Google Dataset Search contains a superset of the datasets in other dataset-discovery tools—a total of 45 million datasets from 13,000 sources. We found that the tool addresses a previously identified need: a search engine for datasets across the entire web, including datasets in other tools. However, the tool introduced new challenges due to its open approach: building a mental model of the tool, making sense of heterogeneous datasets, and learning how to search for datasets. We discuss recommendations for dataset-discovery tools and open research questions.
View details
Using Early Readouts to Mediate Featural Bias in Distillation
Rishabh Tiwari
Durga Sivasubramanian
Anmol Mekala
Ganesh Ramakrishnan
WACV 2024 (2024)
Preview abstract
Deep networks tend to learn spurious feature-label correlations in real-world supervised learning tasks. This vulnerability is aggravated in distillation, where a (student) model may have less representational capacity than the corresponding teacher model. Often, knowledge of specific problem features is used to reweight instances & rebalance the learning process. We propose a novel early readout mechanism whereby we attempt to predict the label using representations from earlier network layers. We show that these early readouts automatically identify problem instances or groups in the form of confident, incorrect predictions. We improve group fairness measures across benchmark datasets by leveraging these signals to mediate between teacher logits and supervised label. We extend our results to the closely related but distinct problem of domain generalization, which also critically depends on the quality of learned features. We provide secondary analyses that bring insight into the role of feature learning in supervision and distillation.
View details
RewriteLM: An Instruction-Tuned Large LanguageModel for Text Rewriting
Yun Zhu
Simon Tong
Lei Meng
Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 18970-18980 (2024)
Preview abstract
In recent years, Large Language Models (LLMs) have demonstrated impressive zero-shot capabilities in text generation tasks expressed through natural language instructions. However, text rewriting is a challenging task, and unintended modifications can negatively impact the system's performance. To address this challenge, we introduce a novel benchmark for text rewriting that covers a wide variety of rewriting types expressed through natural language instructions. Unlike previous benchmarks, which were primarily focused on limited rewrite styles and sentence-level rewriting, our benchmark is specifically designed to facilitate open-ended rewriting of long-form text. Additionally, we present a strong baseline model, RewriteLM, which is an instruction-tuned large language model for text rewriting. The model is trained using supervised fine-tuning, reward training, and reinforcement learning. To minimize human intervention in the data collection process, we develop new data generation strategies: (1) utilizing high-quality, long-form edits from Wikipedia as our primary natural training data source, (2) generating a synthetic dataset that includes diverse edit types and non-Wiki domains using chain-of-thoughts and the capabilities of LLMs, and (3) employing human-designed heuristic rankers to generate preference data. Our experiments demonstrate the effectiveness of our proposed benchmark and baseline model, as well as the benefits of our data collection strategies in minimizing human intervention.
View details
Preview abstract
This paper presents NOMAD (Non-Matching Audio Distance), a differentiable perceptual similarity metric that measures the distance of a degraded signal against non-matching references. The proposed method is based on learning deep feature embeddings via a triplet loss guided by the Neurogram Similarity Index Measure (NSIM) to capture degradation intensity. During inference, the similarity score between any two audio samples is computed through Euclidean distance of their embeddings. NOMAD is fully unsupervised and can be used in general perceptual audio tasks for audio analysis e.g. quality assessment and generative tasks such as speech enhancement and speech synthesis. The proposed method is evaluated with 3 tasks. Ranking degradation intensity, predicting speech quality, and as a loss function for speech enhancement. Results indicate NOMAD outperforms other non-matching reference approaches in both ranking degradation intensity and quality assessment, exhibiting competitive performance with full-reference audio metrics. NOMAD demonstrates a promising technique that mimics human capabilities in assessing audio quality with non-matching references to learn perceptual embeddings without the need for human-generated labels.
View details
LabelMaker: Automatic Semantic Label Generation from RGB-D Trajectories
Silvan Weder
Hermann Blum
Francis Engelmann
Marc Pollefeys
3DV (2024)
Preview abstract
Semantic annotations are indispensable to train or evaluate perception models, yet very costly to acquire. This work introduces a fully automated 2D/3D labeling framework that, without any human intervention, can generate labels for RGB-D scans at equal (or better) level of accuracy than comparable manually annotated datasets such as ScanNet. Our approach is based on an ensemble of state-of-the-art segmentation models and 3D lifting through neural rendering. We demonstrate the effectiveness of our LabelMaker pipeline by generating significantly better labels for the ScanNet datasets and automatically labelling the previously unlabeled ARKitScenes dataset. Code and models are available at https://labelmaker.org/
View details