Xiao Ma
Xiao works on the intersection of Machine Learning and Human-Computer Interaction.
Authored Publications
Sort By
Preview abstract
Language models still struggle on moral reasoning, despite their impressive performance in many other tasks. In particular, the Moral Scenarios task in MMLU (Multi-task Language Understanding) is among the worst performing tasks for many language models, including GPT-3. In this work, we propose a new prompting framework, Thought Experiments, to teach language models to do better moral reasoning using counterfactuals. Experiment results show that our framework elicits counterfactual questions and answers from the model, which in turn helps improve the accuracy on Moral Scenarios task by 9-16% compared to other zero-shot baselines. Interestingly, unlike math reasoning tasks, zero-shot Chain-of-Thought (CoT) reasoning doesn't work out of the box, and even reduces accuracy by around 4% compared to direct zero-shot. We further observed that with minimal human supervision in the form of 5 few-shot examples, the accuracy of the task can be improved to as much as 80%.
View details
A Mixed-Methods Approach to Understanding User Trust after Voice Assistant Failures
Allison Mercurio
Amanda Elizabeth Baughan
Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems Pages (2023)
Preview abstract
Despite huge gains in performance in natural language understanding via large language models in recent years, voice assistants still often fail to meet user expectations. In this study, we conducted a mixed-methods analysis of how voice assistant failures affect users' trust in their voice assistants. To illustrate how users have experienced these failures, we contribute a crowdsourced dataset of 199 voice assistant failures, categorized across 12 failure sources. Relying on interview and survey data, we find that certain failures, such as those due to overcapturing users' input, derail user trust more than others. We additionally examine how failures impact users' willingness to rely on voice assistants for future tasks. Users often stop using their voice assistants for specific tasks that result in failures for a short period of time before resuming similar usage. We demonstrate the importance of low stakes tasks, such as playing music, towards building trust after failures.
View details
A Human-ML Collaboration Framework for Improving Video Content Reviews
Alex Beutel
Alex Koes
Meghana Deodhar
Yixin Cai
ACM CIKM 2022 Workshop Human-in-the-Loop Data Curation (2022)
Preview abstract
We deal with the problem of localized in-video taxonomic human annotation in the video content moderation domain, where the goal is to identify video segments that violate granular policies, e.g., community guidelines on an online video platform. High quality human labeling is critical for enforcement in content moderation. This is challenging due to the problem of information overload - raters need to apply a large taxonomy of granular policy violations with ambiguous definitions, within a limited review duration to relatively long videos. Our key contribution is a novel human-machine learning (ML) collaboration framework aimed at maximizing the quality and efficiency of human decisions in this setting - human labels are used to train segment-level models, the predictions of which are displayed as "hints" to human raters, indicating probable regions of the video with specific policy violations. The human verified/corrected segment labels can help refine the model further, hence creating a human-ML positive feedback loop. Experiments show improved human video moderation decision quality, and efficiency through more granular annotations submitted within a similar review duration, which enable a 5-8% AUC improvement in the hint generation models.
View details
Preview abstract
Voice assistants have been successfully adopted for simple, routine tasks, such as asking for the weather or setting an alarm. However, as people get more familiar with voice assistants, they may increase their expectations for more complex tasks, such as exploratory search — e.g., “What should I do when I visit Paris with kids? Oh, and ideally not too expensive.” Compared to simple search tasks such as “How tall is the Eiffel Tower?”, which can be answered with a single-shot answer, the response to exploratory search is more nuanced, especially through voice-based assistants. In this paper, we outline four challenges in designing voice assistants that can better support exploratory search: addressing situationally induced impairments; working with mixed-modal interactions; designing for diverse populations; and meeting users’ expectations and gaining their trust. Addressing these challenges is important for developing more “intelligent” voice-based personal assistants.
View details