Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10129 publications
The effect of uncertainty in humidity and model parameters on the prediction of contrail energy forcing
Marc Shapiro
Zebediah Engberg
Tharun Sankar
Marc E.J. Stettler
Roger Teoh
Ulrich Schumann
Susanne Rohs
Erica Brand
Environmental Research Communications, 6 (2024), pp. 095015
Preview abstract
Previous work has shown that while the net effect of aircraft condensation trails (contrails) on the climate is warming, the exact magnitude of the energy forcing per meter of contrail remains uncertain. In this paper, we explore the skill of a Lagrangian contrail model (CoCiP) in identifying flight segments with high contrail energy forcing. We find that skill is greater than climatological predictions alone, even accounting for uncertainty in weather fields and model parameters. We estimate the uncertainty due to humidity by using the ensemble ERA5 weather reanalysis from the European Centre for Medium-Range Weather Forecasts (ECMWF) as Monte Carlo inputs to CoCiP. We unbias and correct under-dispersion on the ERA5 humidity data by forcing a match to the distribution of in situ humidity measurements taken at cruising altitude. We take CoCiP energy forcing estimates calculated using one of the ensemble members as a proxy for ground truth, and report the skill of CoCiP in identifying segments with large positive proxy energy forcing. We further estimate the uncertainty due to model parameters in CoCiP by performing Monte Carlo simulations with CoCiP model parameters drawn from uncertainty distributions consistent with the literature. When CoCiP outputs are averaged over seasons to form climatological predictions, the skill in predicting the proxy is 44%, while the skill of per-flight CoCiP outputs is 84%. If these results carry over to the true (unknown) contrail EF, they indicate that per-flight energy forcing predictions can reduce the number of potential contrail avoidance route adjustments by 2x, hence reducing both the cost and fuel impact of contrail avoidance.
View details
Prompt-Based Label-Aware Framework for Few-Shot Multi-Label Text Classification
Thanakorn Thaminkaew
Peerapon Vateekul
IEEE Access, 12 (2024), pp. 28310-28322
Preview abstract
Prompt-based learning has demonstrated remarkable success in few-shot text classification, outperforming the traditional fine-tuning approach. This method transforms a text input into a masked language modeling prompt using a template, queries a fine-tuned language model to fill in the mask, and then uses a verbalizer to map the model’s output to a predicted class. Previous prompt-based text classification approaches were primarily designed for multi-class classification, taking advantage of the fact that the classes are mutually exclusive and one example belongs to only one class. However, these assumptions do not hold in the context of multi-label text classification, where labels often exhibit correlations with each other. Therefore, we propose a Prompt-based Label-Aware framework for Multi-Label text classification (PLAML) that addresses the challenges. Specifically, PLAML enhances prompt-based learning with three proposed techniques to improve the overall performance for multi-label classification. The techniques include (i) a token weighting algorithm that considers the correlations between labels, (ii) a template for augmenting training samples, making the training process label-aware, and (iii) a dynamic threshold mechanism, refining the prediction condition of each label. Extensive experiments on few-shot text classification across multiple datasets with various languages show that our PLAML outperforms other baseline methods. We also analyzed the effect of each proposed technique to better understand how it is suitable for the multi-label setting.
View details
“They only care to show us the wheelchair”: disability representation in text-to-image AI models
Avery Mack
Rida Qadri
CHI Conference on Human-Computer Interaction (2024)
Preview abstract
This paper reports on disability representation in images output from text-to-image (T2I) generative AI systems. Through eight focus groups with 25 people
with disabilities, we found that models repeatedly presented reductive archetypes for different disabilities. Often these representations reflected broader
societal stereotypes and biases, which our participants were concerned to see reproduced through T2I. Our participants discussed further challenges with
using these models including the current reliance on prompt engineering to reach satisfactorily diverse results. Finally, they offered suggestions for
how to improve disability representation with solutions like showing multiple, heterogeneous images for a single prompt and including the prompt with images
generated. Our discussion reflects on tensions and tradeoffs we found among the diverse perspectives shared to inform future research on representation-oriented
generative AI system evaluation metrics and development processes.
View details
Creative ML Assemblages: The Interactive Politics of People, Processes, and Products
Ramya Malur Srinivasan
Katharina Burgdorf
Jennifer Lena
ACM Conference on Computer Supported Cooperative Work and Social Computing (2024) (to appear)
Preview abstract
Creative ML tools are collaborative systems that afford artistic creativity through their myriad interactive relationships. We propose using ``assemblage thinking" to support analyses of creative ML by approaching it as a system in which the elements of people, organizations, culture, practices, and technology constantly influence each other. We model these interactions as ``coordinating elements" that give rise to the social and political characteristics of a particular creative ML context, and call attention to three dynamic elements of creative ML whose interactions provide unique context for the social impact a particular system as: people, creative processes, and products. As creative assemblages are highly contextual, we present these as analytical concepts that computing researchers can adapt to better understand the functioning of a particular system or phenomena and identify intervention points to foster desired change. This paper contributes to theorizing interactions with AI in the context of art, and how these interactions shape the production of algorithmic art.
View details
Preview abstract
The effect of regularizers such as weight decay when training deep neural networks is not well understood. We study the influence of weight decay as well as $L2$-regularization when training neural network models in which parameter matrices interact multiplicatively. This combination is of particular interest as this parametrization is common in attention layers, the workhorse of transformers. Here, key-query, as well as value-projection parameter matrices, are multiplied directly with each other: $W_K^TW_Q$ and $PW_V$.
We extend previous results and show on one hand that any local minimum of a $L2$-regularized loss of the form $L(AB^\top) + \lambda (\|A\|^2 + \|B\|^2)$ coincides with a minimum of the nuclear norm-regularized loss $L(AB^\top) + \lambda\|AB^\top\|_*$, and on the other hand that the 2 losses become identical exponentially quickly during training. We thus complement existing works linking $L2$-regularization with low-rank regularization, and in particular, explain why such regularization on the matrix product affects early stages of training.
Based on these theoretical insights, we verify empirically that the key-query and value-projection matrix products $W_K^TW_Q, PW_V$ within attention layers, when optimized with weight decay, as usually done in vision tasks and language modelling, indeed induce a significant reduction in the rank of $W_K^TW_Q$ and $PW_V$, even in fully online training.
We find that, in accordance with existing work, inducing low rank in attention matrix products can damage language model performance, and observe advantages when decoupling weight decay in attention layers from the rest of the parameters.
View details
Meta-Manager: A Tool for Collecting and Exploring Meta Information about Code
Amber Horvath
Brad A. Myers
CHI '24: Proceedings of the CHI Conference on Human Factors in Computing Systems (2024)
Preview abstract
Modern software engineering is in a state of flux. With more development utilizing AI code generation tools and the continued reliance on online programming resources, understanding code and the original intent behind it is becoming more important than it ever has been. To this end, we have developed the “Meta-Manager”, a Visual Studio Code extension, with a supplementary browser extension, that automatically collects and organizes changes made to code while keeping track of the provenance of each part of the code, including code that has been copy-pasted from popular programming resources online. These sources and subsequent changes are represented in the editor and may be explored using searching and filtering mechanisms to help developers answer historically hard-to-answer questions about code, its provenance, and its design rationale. In our evaluation of Meta-Manager, we found developers
were successfully able to use it to answer otherwise unanswerable questions about an unfamiliar code base.
View details
LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D Signals
Arjun Karpur
Guilherme Perrotta
Ricardo Martin-Brualla
Proc. 3DV'24 (2024) (to appear)
Preview abstract
Finding localized correspondences across different images of the same object is crucial to understand its geometry. In recent years, this problem has seen remarkable progress with the advent of deep learning-based local image features and learnable matchers. Still, learnable matchers often underperform when there exists only small regions of co-visibility between image pairs (i.e. wide camera baselines). To address this problem, we leverage recent progress in coarse single-view geometry estimation methods. We propose LFM-3D, a Learnable Feature Matching framework that uses models based on graph neural networks and enhances their capabilities by integrating noisy, estimated 3D signals to boost correspondence estimation. When integrating 3D signals into the matcher model, we show that a suitable positional encoding is critical to effectively make use of the low-dimensional 3D information. We experiment with two different 3D signals - normalized object coordinates and monocular depth estimates - and evaluate our method on large-scale (synthetic and real) datasets containing object-centric image pairs across wide baselines. We observe strong feature matching improvements compared to 2D-only methods, with up to +6% total recall and +28% precision at fixed recall. Additionally, we demonstrate that the resulting improved correspondences lead to much higher relative posing accuracy for in-the-wild image pairs - up to 8.6% compared to the 2D-only approach.
View details
TRINDs: Assessing the Diagnostic Capabilities of Large Language Models for Tropical and Infectious Diseases
Steve Adudans
Oluwatosin Akande
Chintan Ghate
Sylvanus Aitkins
Geoffrey Siwo
Lynda Osadebe
Nenad Tomašev
Eric Ndombi
Preview abstract
Neglected tropical diseases (NTDs) and infectious diseases disproportionately affect the poorest regions of the world. While large language models (LLMs) have shown promise for medical question answering, there is limited work focused on tropical and infectious disease-specific explorations. We introduce TRINDs, a dataset of 52 tropical and infectious diseases with demographic and semantic clinical and consumer augmentations. We evaluate various context and counterfactual locations to understand their influence on LLM performance. Results show that LLMs perform best when provided with contextual information such as demographics, location, and symptoms. We also develop TRINDs-LM, a tool that enables users to enter symptoms and contextual information to receive a most likely diagnosis. In addition to the LLM evaluations, we also conducted a human expert baseline study to assess the accuracy of human experts in diagnosing tropical and infectious diseases with 7 medical and public health experts. This work demonstrates methods for creating and evaluating datasets for testing and optimizing LLMs, and the use of a tool that could improve digital diagnosis and tracking of NTDs.
View details
Preview abstract
We present a new task and dataset, ScreenQA, for screen content understanding via question answering. The existing screen datasets are focused either on structure and component-level understanding, or on a much higher-level composite task such as navigation and task completion. We attempt to bridge the gap between these two by annotating 86K question-answer pairs over the RICO dataset in hope to benchmark the screen reading comprehension capacity.
View details
Productive Coverage: Improving the Actionability of Code Coverage
Gordon
Luka Kalinovcic
Marko Ivanković
Mateusz Lewko
Rene Just
Yana Kulizhskaya
ICSE-SEIP '24: Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice (2024) (to appear)
Preview abstract
Code coverage is an intuitive and established test adequacy measure.
However, not all parts of the code base are equally important, and
hence additional testing may be critical for some uncovered code,
whereas it may not be worthwhile for other uncovered code. As a
result, simply visualizing uncovered code is not reliably actionable.
To make code coverage actionable and further improve code
coverage in our codebase, we developed Productive Coverage —
a novel approach to code coverage that guides developers to uncovered code that that should be tested by (unit) tests. Specifically,
Productive Coverage identifies uncovered code that is similar to
existing code, which in turn is tested and/or frequently executed in
production. We implemented and evaluated Productive Coverage
for four programming languages (C++, Java, Go, and Python). The
evaluation shows: (1) The developer sentiment, measured at the
point of use, is strongly positive; (2) Productive Coverage meaningfully increases code coverage above a strong baseline; (3) Productive
Coverage has no negative effect on code authoring efficiency; (4)
Productive Coverage modestly improves code-review effiency; (5)
Productive Coverage directly improves code quality and prevents
bugs from being introduced, in addition to improving test quality
View details
Preview abstract
Detecting offensive content in text is an increasingly central challenge for both social-media platforms and AI-driven technologies. However offensiveness remains a subjective phenomenon as perspectives differ across sociodemographic characteristics, as well as cultural norms and moral values. This intricacy is largely ignored in the current AI-focused approaches for detecting offensiveness or related concepts such as hate speech and toxicity detection. We frame the task of determining offensiveness as essentially a matter of moral judgment --- deciding the boundaries of ethically wrong vs. right language to be used or generated within an implied set of sociocultural norms. In this paper, we investigate how judgment of offensiveness varies across diverse global cultural regions, and the crucial role of moral values in shaping these variations. Our findings highlight substantial cross-cultural differences in perceiving offensiveness, with moral concerns about Caring and Purity as the mediating factor driving these differences. These insights are of importance as AI safety protocols, shaped by human annotators' inputs and perspectives, embed their moral values which do not align with the notions of right and wrong in all contexts, and for all individuals.
View details
Unveiling Privacy Perspectives about Mobile Health Apps on a Large Scale
PETS workshop: Privacy, Safety and Trust for Mobile Health Apps (2024)
Preview abstract
In this paper we study users' opinions about the privacy of their mobile health apps. We look at what they write in app reviews in the 'Health & Fitness' category on the Google Play store. We identified 2832 apps in this category (based on 1K minimum installs). Using NLP/LLM analyses, we find that 76% of these apps have at least some privacy reviews. In total this yields over 164,000 reviews about privacy, from over 150 countries and in 25 languages. Our analyses identifies top themes and offers an approximation of how widespread these issues are around the world. We show that the top 4 themes - Data Sharing and Exposure, Permission Requests, Location Tracking and Data Collection - are issues of concern in over 70 countries. Our automatically generated thematic summaries reveal interesting aspects that deserve further research around user suspicions (unneeded data collection), user requests (more fine-grained control over data collection and data access), as well as user behavior (uninstalling apps).
View details
Preview abstract
Motivated by recent advances in large language models for NLP, we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of datasets, matches the accuracy of state-of-the-art supervised forecasting models for each individual dataset. Our model is based on pretraining a patched-decoder style attention model on a large time series dataset, and can work well across different forecasting history lengths, prediction lengths and temporal granularities.
View details
Believing Anthropomorphism: Examining the Role of Anthropomorphic Cues on User Trust in Large Language Models
Michelle Cohn
Femi Olanubi
Zion Mengesha
Daniel Padgett
CM (Association of Computing Machinery) CHI conference on Human Factors in Computing Systems 2024 (2024)
Preview abstract
People now regularly interface with Large Language Models (LLMs) via speech and text (e.g., Bard) interfaces. However, little is known about the relationship between how users anthropomorphize an LLM system (i.e., ascribe human-like characteristics to a system) and how they trust the information the system provides. Participants (n=2,165; ranging in age from 18-90 from the United States) completed an online experiment, where they interacted with a pseudo-LLM that varied in modality (text only, speech + text) and grammatical person (“I” vs. “the system”) in its responses. Results showed that the “speech + text” condition led to higher anthropomorphism of the system overall, as well as higher ratings of accuracy of the information the system provides. Additionally, the first-person pronoun (“I”) led to higher information accuracy and reduced risk ratings, but only in one context. We discuss these findings for their implications for the design of responsible, human–generative AI experiences.
View details
Now You See Me, Now You Don't: 'Poverty of the Stimulus' Problems and Arbitrary Correspondences in End-to-End Speech Models
Proceedings of the Second Workshop on Computation and Written Language (CAWL) 2024
Preview abstract
End-to-end models for speech recognition and speech synthesis have many benefits, but we argue they also face a unique set of challenges not encountered in conventional multi-stage hybrid systems, which relied on the explicit injection of linguistic knowledge through resources such as phonemic dictionaries and verbalization grammars. These challenges include handling words with unusual grapheme-to-phoneme correspondences, converting between written forms like ‘12’ and spoken forms such as ‘twelve’, and contextual disambiguation of homophones or homographs. We describe the mitigation strategies that have been used for these problems in end-to-end systems, either implicitly or explicitly, and call out that the most commonly used mitigation techniques are likely incompatible with newly emerging approaches that use minimal amounts of supervised audio training data. We review best-of-both-world approaches that allow the use of end-to-end models combined with traditional linguistic resources, which we show are increasingly straightforward to create at scale, and close with an optimistic outlook for bringing speech technologies to many more languages by combining these strands of research.
View details