Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10129 publications
Concordance of randomised controlled trials for artificial intelligence interventions with the CONSORT-AI reporting guidelines
Aditya U Kale
Alastair Dennison
Alexander Martindale
An Wen Chan
Andrew Beam
Benjamin Ng
Cecilia S. Lee
Christopher Kelly
Christopher Yau
David Moher
Gary Collins
Lauren Oakden-Rayner
Lavinia Ferrante di Ruffano
Melanie Calvert
Melissa D McCradden
Pearse Keane
Robert Golub
Samantha Cruz Rivera
Victoria Ngai
Xiaoxuan Liu
Nature Communications (2024)
Preview abstract
The Consolidated Standards of Reporting Trials extension for Artificial Intelligence interventions (CONSORT-AI) was published in September 2020. Since its publication, several randomised controlled trials (RCTs) of AI interventions have been published but their completeness and transparency of reporting is unknown. This systematic review assesses the completeness of reporting of AI RCTs following publication of CONSORT-AI and provides a comprehensive summary of RCTs published in recent years. 65 RCTs were identified, mostly conducted in China (37%) and USA (18%). Median concordance with CONSORT-AI reporting was 90% (IQR 77–94%), although only 10 RCTs explicitly reported its use. Several items were consistently under-reported, including algorithm version, accessibility of the AI intervention or code, and references to a study protocol. Only 3 of 52 included journals explicitly endorsed or mandated CONSORT-AI. Despite a generally high concordance amongst recent AI RCTs, some AI-specific considerations remain systematically poorly reported. Further encouragement of CONSORT-AI adoption by journals and funders may enable more complete adoption of the full CONSORT-AI guidelines.
View details
Hardware-Assisted Fault Isolation: Going Beyond the Limits of Software-Based Sandboxing
Anjo Vahldiek-Oberwagner
Tal Garfinkel
Deian Stefan
Michael LeMay
Evan Johnson
Mohammadkazem Taram
Chris Fallin
Ravi Sahita
Joey Rudek
Shravan Narayan
Dean Tullsen
IEEE Micro (2024)
Preview abstract
Hardware-assisted Fault Isolation (HFI) is a minimal extension to current processors that supports secure, flexible, and efficient in-process isolation. HFI addresses the limitations of software-based isolation (SFI) systems including: runtime overheads, limited scalability, vulnerability to Spectre attacks, and limited compatibility with existing code. HFI can be seamlessly integrated into exisiting SFI systems (e.g. WebAssembly), or directly sandbox unmodified native binaries. To ease adoption, HFI proposes incremental changes to existing high-performance processors.
View details
Preview abstract
Floods are one of the most common natural disasters, with a disproportionate impact in developing countries that often lack dense streamflow gauge networks. Accurate and timely warnings are critical for mitigating flood risks, but hydrological simulation models typically must be calibrated to long data records in each watershed. Here we show that AI-based forecasting achieves reliability in predicting extreme riverine events in ungauged watersheds at up to a 5-day lead time that is similar to or better than the reliability of nowcasts (0-day lead time) from a current state of the art global modeling system (the Copernicus Emergency Management Service Global Flood Awareness System). Additionally, we achieve accuracies over 5-year return period events that are similar to or better than current accuracies over 1-year return period events. This means that AI can provide flood warnings earlier and over larger and more impactful events in ungauged basins. The model developed in this paper was incorporated into an operational early warning system that produces publicly available (free and open) forecasts in real time in over 80 countries. This work highlights a need for increasing the availability of hydrological data to continue to improve global access to reliable flood warnings.
View details
Preview abstract
While large, generative, multilingual models are rapidly being developed and deployed, their safety and fairness evaluations primarily hinge on resources collected in the English language and some limited translations. This has been demonstrated to be insufficient, and severely lacking in nuances of unsafe language and stereotypes prevalent in different languages and the geographical pockets they are prevalent in. Gathering these resources, at scale, in varied languages and regions also poses a challenge as it requires expansive sociolinguistic knowledge and can also be prohibitively expensive. We utilize an established methodology of coupling LLM generations with distributed annotations to overcome these gaps and create the resource SeeGULL Multilingual, spanning 20 languages across 23 regions.
View details
FieldSwap: Data Augmentation for Effective Form-Like Document Extraction
Seth Ebner
IEEE 40th International Conference on Data Engineering (ICDE) (2024), pp. 4722-4732
Preview abstract
Extracting structured data from visually rich documents like invoices, receipts, financial statements, and tax forms is key to automating many business workflows. However, building extraction models in this domain often demands a large collection of high-quality training examples. To address this challenge, we introduce FieldSwap, a novel data augmentation technique specifically designed for such extraction problems. FieldSwap generates synthetic training examples by replacing key phrases indicative of one field with those corresponding to another. Our experiments on five diverse datasets demonstrate that incorporating FieldSwap-augmented data into the training process can enhance model performance by 1-11 F1 points, particularly when dealing with limited training data (10--100 documents). Additionally, we propose algorithms for automatically inferring key phrases from the training data. Our findings indicate that FieldSwap is effective regardless of whether key phrases are manually provided by human experts or inferred automatically.
View details
Bridging the Gap: Unpacking the Hidden Challenges in Knowledge Distillation for Online Ranking Systems
Shuo Yang
Aniruddh Nath
Yang Liu
Li Wei
Shawn Andrews
Maciej Kula
Jarrod Kahn
Zhe Zhao
Lichan Hong
Preview abstract
Knowledge Distillation (KD) is a powerful approach for compressing large models into smaller, more efficient models, particularly beneficial for latency-sensitive applications like recommender systems. However, current KD research predominantly focuses on Computer Vision (CV) and NLP tasks, overlooking unique data characteristics and challenges inherent to recommender systems. This paper addresses these overlooked challenges, specifically: (1) mitigating data distribution shifts between teacher and student models, (2) efficiently identifying optimal teacher configurations within time and budgetary constraints, and (3) enabling computationally efficient and rapid sharing of teacher labels to support multiple students. We present a robust KD system developed and rigorously evaluated on multiple large-scale personalized video recommendation systems within Google. Our live experiment results demonstrate significant improvements in student model performance while ensuring the consistent and reliable generation of high-quality teacher labels from continuous data streams.
View details
Preview abstract
Verifying credentials, such as educational degrees, professional licenses, and permits, is a crucial yet challenging task for organizations globally. Traditional verification methods often rely on third-party vendors, introducing vulnerabilities like bias, security breaches, and privacy risks. While blockchain technology offers a promising solution for credential management, existing approaches often store sensitive credential data off-chain in centralized databases or InterPlanetary File System (IPFS), leaving them susceptible to data breaches and loss.
This paper presents a novel, privacy-preserving credential verification system built on a permissioned blockchain network. This system, implemented using the Hyperledger Fabric framework, offers several key advantages over traditional methods, including enhanced security and improved privacy. By leveraging cryptographic techniques, the system ensures the robust and privacypreserving storage of credentials directly on the blockchain. This eliminates the reliance on vulnerable off-chain storage and mitigates associated risks. Furthermore, our analysis of a common credential dataset demonstrates the practical feasibility and cost-effectiveness of our solution, suggesting its widespread adoption. By addressing the limitations of both traditional and existing blockchain-based approaches, our system provides a robust, secure, and efficient solution for credential management in diverse sectors.
View details
FrameQuant: Flexible Low-Bit Quantization for Transformers
Harshavardhan Adepu
Zhanpeng Zeng
Vikas Singh
International Conference on Machine Learning (2024)
Preview abstract
Transformers are the backbone of powerful foundation models for many Vision and Natural Language Processing tasks. But their compute and memory/storage footprint is large, and so, serving such models is expensive often requiring high-end hardware. To mitigate this difficulty, Post-Training Quantization seeks to modify a pre-trained model and quantize it to eight bits or lower, significantly boosting compute/memory/latency efficiency. Such models have been successfully quantized to four bits with some performance loss. In this work, we outline a simple scheme to quantize Transformer-based models to just two bits (plus some overhead) with only a small drop in accuracy. Key to our formulation is a concept borrowed from Harmonic analysis called Fusion Frames. Our main finding is that the quantization must take place not in the original weight space, but instead in the Fusion Frame representations. If quantization is interpreted as the addition of noise, our casting of the problem allows invoking an extensive body of known consistent recovery and noise robustness guarantees. Further, if desired, denoising filters are known in closed form. We show empirically, via a variety of experiments, that (almost) two-bit quantization for Transformer models promises sizable efficiency gains.
View details
Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization
Advances in Neural Information Processing Systems (NeurIPS) (2024) (to appear)
Preview abstract
Large language models have demonstrated remarkable capabilities, but their performance is heavily reliant on effective prompt engineering. Automatic prompt optimization (APO) methods are designed to automate this and can be broadly categorized into those targeting instructions (instruction optimization, IO) vs. those targeting exemplars (exemplar selection, ES). Despite their shared objective, these have evolved rather independently, with IO recently receiving more research attention. This paper seeks to bridge this gap by comprehensively comparing the performance of representative IO and ES techniques, both isolation and combination, on a diverse set of challenging tasks. Our findings reveal that intelligently reusing model-generated input-output pairs obtained from evaluating prompts on the validation set as exemplars consistently improves performance over IO methods but is currently under-investigated. We also find that despite the recent focus on IO, how we select exemplars can outweigh how we optimize instructions, with ES strategies as simple as random search outperforming state-of-the-art IO methods with seed instructions without any optimization. Moreover, we observe synergy between ES and IO, with optimal combinations surpassing individual contributions. We conclude that studying exemplar selection as a standalone method and its optimal combination with instruction optimization remains a crucial aspect of APO and deserves greater consideration in future research, even in the era of highly capable instruction-following models.
View details
Searching for Dermatology Information Online using Images vs Text: a Randomized Study
Jay Hartford
Amit Talreja
Natalie Salaets
Kimberley Raiford
Jay Nayar
Dounia Berrada
Harsh Kharbanda
Lou Wang
Peggy Bui
medRxiv (2024)
Preview abstract
Background: Skin conditions are extremely common worldwide, and are an important cause of both anxiety and morbidity. Since the advent of the internet, individuals have used text-based search (eg, “red rash on arm”) to learn more about concerns on their skin, but this process is often hindered by the inability to accurately describe the lesion’s morphology. In the study, we surveyed respondents’ experiences with an image-based search, compared to the traditional text-based search experience.
Methods: An internet-based survey was conducted to evaluate the experience of text-based vs image-based search for skin conditions. We recruited respondents from an existing cohort of volunteers in a commercial survey panel; survey respondents that met inclusion/exclusion criteria, including willingness to take photos of a visible concern on their body, were enrolled. Respondents were asked to use the Google mobile app to conduct both regular text-based search (Google Search) and image-based search (Google Lens) for their concern, with the order of text vs. image search randomized. Satisfaction for each search experience along six different dimensions were recorded and compared, and respondents’ preferences for the different search types along these same six dimensions were recorded.
Results: 372 respondents were enrolled in the study, with 44% self-identifying as women, 86% as White and 41% over age 45. The rate of respondents who were at least moderately familiar with searching for skin conditions using text-based search versus image-based search were 81.5% and 63.5%, respectively. After using both search modalities, respondents were highly satisfied with both image-based and text-based search, with >90% at least somewhat satisfied in each dimension and no significant differences seen between text-based and image-based search when examining the responses on an absolute scale per search modality. When asked to directly rate their preferences in a comparative way, survey respondents preferred image-based search over text-based search in 5 out of 6 dimensions, with an absolute 9.9% more preferring image-based search over text-based search overall (p=0.004). 82.5% (95% CI 78.2 - 86.3) reported a preference to leverage image-based search (alone or in combination with text-based search) in future searches. Of those who would prefer to use a combination of both, 64% indicated they would like to start with image-based search, indicating that image-based search may be the preferred entry point for skin-related searches.
Conclusion: Despite being less familiar with image-based search upon study inception, survey respondents generally preferred image-based search to text-based search and overwhelmingly wanted to include this in future searches. These results suggest the potential for image-based search to play a key role in people searching for information regarding skin concerns.
View details
Preview abstract
Background. Wildfire research uses ensemble methods to analyze fire behaviors and assess
uncertainties. Nonetheless, current research methods are either confined to simple models
or complex simulations with limits. Modern computing tools could allow for efficient, high-
fidelity ensemble simulations. Aims. This study proposes a high-fidelity ensemble wildfire
simulation framework for studying wildfire behavior, ML tasks, fire-risk assessment, and
uncertainty analysis. Methods. In this research, we present a simulation framework that
integrates the Swirl-Fire large-eddy simulation tool for wildfire predictions with the Vizier
optimization platform for automated run-time management of ensemble simulations and
large-scale batch processing. All simulations are executed on tensor-processing units to
enhance computational efficiency. Key results. A dataset of 117 simulations is created,
each with 1.35 billion mesh points. The simulations are compared to existing experimental
data and show good agreement in terms of fire rate of spread. Computations are done for
fire acceleration, mean rate of spread, and fireline intensity. Conclusions. Strong coupling
between these 2 parameters are observed for the fire spread and intermittency. A critical
Froude number that delineates fires from plume-driven to convection-driven is identified and
confirmed with literature observations. Implications. The ensemble simulation framework
is efficient in facilitating parametric wildfire studies.
View details
Help and The Social Construction of Access: A Case-Study from India
Vaishnav Kameswaran
Jerry Young Robinson
Nithya Sambasivan
Gaurav Aggarwal
Proceedings of ASSETS 2024, ACM (2024)
Preview abstract
A goal of accessible technology (AT) design is often to increase independence, i.e., to enable people with disabilities to accomplish tasks on their own without help. Recent work uses "interdependence" to challenge this view, a framing that recognizes mutual dependencies as critical to addressing the access needs of people with disabilities. However, empirical evidence examining interdependence is limited to the Global North; we address this gap, using interdependence as an analytical frame to understand how people with visual impairments (PVI) in India navigate indoor environments. Using interviews with PVI and their companions and a video-diary study we find that help is a central way of working for PVI to circumvent issues of social and structural inaccess and necessitates work. We uncover three kinds of interdependencies 1) self-initiated, 2) serendipitous, and 3) obligatory and discuss the implications these interdependencies have for AT design in the Global South.
View details
TextMesh: Generation of Realistic 3D Meshes From Text Prompts
Christina Tsalicoglou
Fabian Manhardt
Michael Niemeyer
3DV 2024 (2024)
Preview abstract
The ability to generate highly realistic 2D images from mere text prompts has recently made huge progress in terms of speed and quality, thanks to the advent of image diffusion models. Naturally, the question arises if this can be also achieved in the generation of 3D content from such text prompts. To this end, a new line of methods recently emerged trying to harness diffusion models, trained on 2D images, for supervision of 3D model generation using view dependent prompts. While achieving impressive results, these methods, however, have two major drawbacks. First, rather than commonly used 3D meshes, they instead generate neural radiance fields (NeRFs), making them impractical for most real applications. Second, these approaches tend to produce over-saturated models, giving the output a cartoonish looking effect. Therefore, in this work we propose a novel method for generation of highly realistic-looking 3D meshes. To this end, we extend NeRF to employ an SDF backbone, leading to improved 3D mesh extraction. In addition, we propose a novel way to finetune the mesh texture, removing the effect of high saturation and improving the details of the output 3D mesh.
View details
BEYOND THE CODE: AI REGULATIONS AS THE SECRET COMPASS OF ENGINEERING MANAGERS
Proceedings of the American Society for Engineering Management 2024 International Annual Conference (2024)
Preview abstract
Technology is a product of society. As technology evolves, the norms governing it have to mature for enabling its proper use within the society. The interest in Artificial Intelligence (AI) has surged following the introduction of chatGPT. Firms, both large and small, are competing to develop new products and solutions involving AI. Amidst these developments, leading corporations such as Google and Microsoft have proactively committed to upholding responsible innovation in AI development. Governments worldwide are responding with the creation of guidelines and regulations in the field. Notably, in March 2024, the United Nations General Assembly (UNGA) adopted landmark regulation on AI.
At the heart of these developments in AI are engineering managers who leverage technical advances to build products and services that create value. To effectively harness AI for human benefit, engineering managers must be aware of these evolving regulations governing AI. Some regulations such as Digital Markets Act (DMA) and General Data Protection Regulations (GDPR) have far reaching consequences for organizations globally. Having a working knowledge of these statutory requirements will enable engineering managers to identify the opportunities and constraints in leveraging AI technology while building products and services. It will allow them to make informed decisions about data collection methods, model training processes, the deployment of AI systems and metrics for their evaluation. At scale, it can become a competitive advantage for the firms they work in, as explored through real-world examples in this paper.
View details
PRewrite: Prompt Rewriting with Reinforcement Learning
Qiaozhu Mei
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (2024) (to appear)
Preview abstract
Prompt engineering is critical for the development of LLM-based applications. However, it is usually done manually in a "trial and error" fashion that can be time consuming, ineffective, and sub-optimal. Even for the prompts which seemingly work well, there is always a lingering question: can the prompts be made better with further modifications?
To address these problems, we investigate automated prompt engineering in this paper. Specifically, we propose PRewrite, an automated method to rewrite an under-optimized prompt to a more effective prompt. We instantiate the prompt rewriter using an LLM. The rewriter LLM is trained using reinforcement learning to optimize the performance on a given downstream task. We conduct experiments on diverse benchmark datasets, which demonstrates the effectiveness of PRewrite.
View details