Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10129 publications
    Preview abstract Prompting and in-context learning (ICL) have become efficient learning paradigms for large language models (LLMs). However, LLMs suffer from prompt brittleness and various bias factors in the prompt, including but not limited to the formatting, the choice verbalizers, and the ICL examples. To address this problem that results in unexpected performance degradation, calibration methods have been developed to mitigate the effects of these biases while recovering LLM performance. In this work, we first conduct a systematic analysis of the existing calibration methods, where we both provide a unified view and reveal the failure cases. Inspired by these analyses, we propose Batch Calibration (BC), a simple yet intuitive method that controls the contextual bias from the batched input, unifies various prior approaches, and effectively addresses the aforementioned issues. BC is zero-shot, inference-only, and incurs negligible additional costs. In the few-shot setup, we further extend BC to allow it to learn the contextual bias from labeled data. We validate the effectiveness of BC with PaLM 2-(S, M, L) and CLIP models and demonstrate state-of-the-art performance over previous calibration baselines across more than 10 natural language understanding and image classification tasks. View details
    Computational Methodologies for Understanding, Automating, and Evaluating User Interfaces
    Yuwen Lu
    Yue Jiang
    Christof Lutteroth
    Toby Jia-Jun Li
    Jeffery Nichols
    Wolfgang Stuerzlinger
    Preview abstract Building on the success of the first two workshops on user interfaces (UIs) at CHI 2022 and CHI 2023, this workshop aims to advance the research field by further exploring current research trends, such as applying large language models and visual language models. Previous work has explored computational approaches to understanding and adapting UIs using constraint-based optimization models and machine learning-based data-driven approaches. In addition to further delving into these established UI research areas, we aim to trigger the exploration into the application of the latest advancements in general-purpose large language and vision-language models within the UI domain. We will encourage participants to explore novel methods for understanding, automating, and evaluating UIs. The proposed workshop seeks to bring together academic researchers and industry practitioners interested in computational approaches for UIs to discuss the needs and opportunities for future user interface algorithms, models, and applications. View details
    Partially Interpretable Models with Guarantees on Coverage and Accuracy
    Nave Frost
    Zachary Lipton
    Michal Moshkovitz
    Algorithmic Learning Theory (ALT) (2024)
    Preview abstract Simple, sufficient explanations furnished by short decision lists can be useful for guiding stakeholder actions. Unfortunately, this transparency can come at the expense of the higher accuracy enjoyed by black box methods, like deep nets. To date, practitioners typically either (i) insist on the simpler model, forsaking accuracy; or (ii) insist on maximizing accuracy, settling for post-hoc explanations of dubious faithfulness. In this paper, we propose a hybrid \emph{partially interpretable model} that represents a compromise between the two extremes. In our setup, each input is first processed by a decision list that can either execute a decision or abstain, handing off authority to the opaque model. The key to optimizing the decision list is to optimally trade off the accuracy of the composite system against coverage (the fraction of the population that receives explanations). We contribute a new principled algorithm for constructing partially interpretable decision lists, providing theoretical guarantees addressing both interpretability and accuracy. As an instance of our result, we prove that when the optimal decision list has length $k$, coverage $c$, and $b$ mistakes, our algorithm will generate a decision list that has length no greater than $4k$, coverage at least $c/2$, and makes at most $4b$ mistakes. Finally, we empirically validate the effectiveness of the new model. View details
    Preview abstract One of the most basic problems for studying the "price of privacy over time" is the so called private counter problem, introduced by Dwork et al. (2010) and Chan et al. (2011). In this problem, we aim to track the number of events that occur over time, while hiding the existence of every single event. More specifically, in every time step $t\in[T]$ we learn (in an online fashion) that $\Delta_t\geq 0$ new events have occurred, and must respond with an estimate $n_t\approx\sum_{j=1}^t \Delta_j$. The privacy requirement is that all of the outputs together, across all time steps, satisfy event level differential privacy. The main question here is how our error needs to depend on the total number of time steps $T$ and the total number of events $n$. Dwork et al. (2015) showed an upper bound of $O\left(\log(T)+\log^2(n)\right)$, and Henzinger et al. (2023) showed a lower bound of $\Omega\left(\min\{\log n, \log T\}\right)$. We show a new lower bound of $\Omega\left(\min\{n,\log T\}\right)$, which is tight w.r.t. the dependence on $T$, and is tight in the sparse case where $\log^2 n=O(\log T)$. Our lower bound has the following implications: * We show that our lower bound extends to the online thresholds problem, where the goal is to privately answer many "quantile queries" when these queries are presented one-by-one. This resolves an open question of Bun et al. (2017). * Our lower bound implies, for the first time, a separation between the number of mistakes obtainable by a private online learner and a non-private online learner. This partially resolves a COLT'22 open question published by Sanyal and Ramponi. * Our lower bound also yields the first separation between the standard model of private online learning and a recently proposed relaxed variant of it, called private online prediction. View details
    Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation
    Susan Lin
    Jeremy Warner
    J.D. Zamfirescu-Pereira
    Matthew G Lee
    Sauhard Jain
    Michael Xuelin Huang
    Bjoern Hartmann
    Can Liu
    Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, New York, NY, USA
    Preview abstract Dictation enables efficient text input on mobile devices. However, writing with speech can produce disfluent, wordy, and incoherent text and thus requires heavy post-processing. This paper presents Rambler, an LLM-powered graphical user interface that supports gist-level manipulation of dictated text with two main sets of functions: gist extraction and macro revision. Gist extraction generates keywords and summaries as anchors to support the review and interaction with spoken text. LLM-assisted macro revisions allow users to respeak, split, merge, and transform dictated text without specifying precise editing locations. Together they pave the way for interactive dictation and revision that help close gaps between spontaneously spoken words and well-structured writing. In a comparative study with 12 participants performing verbal composition tasks, Rambler outperformed the baseline of a speech-to-text editor + ChatGPT, as it better facilitates iterative revisions with enhanced user control over the content while supporting surprisingly diverse user strategies. View details
    Preview abstract This paper presents NOMAD (Non-Matching Audio Distance), a differentiable perceptual similarity metric that measures the distance of a degraded signal against non-matching references. The proposed method is based on learning deep feature embeddings via a triplet loss guided by the Neurogram Similarity Index Measure (NSIM) to capture degradation intensity. During inference, the similarity score between any two audio samples is computed through Euclidean distance of their embeddings. NOMAD is fully unsupervised and can be used in general perceptual audio tasks for audio analysis e.g. quality assessment and generative tasks such as speech enhancement and speech synthesis. The proposed method is evaluated with 3 tasks. Ranking degradation intensity, predicting speech quality, and as a loss function for speech enhancement. Results indicate NOMAD outperforms other non-matching reference approaches in both ranking degradation intensity and quality assessment, exhibiting competitive performance with full-reference audio metrics. NOMAD demonstrates a promising technique that mimics human capabilities in assessing audio quality with non-matching references to learn perceptual embeddings without the need for human-generated labels. View details
    Preview abstract A vast amount of human discussion, storytelling, content creation, and reporting now occurs on social media platforms. As such, social media posts are often quoted on web pages as context. In this paper, we argue that these quotations and their surrounding page context provide a rich, platform-independent source of data for studying the intersection of natural language and social media. We introduce a taxonomy of quotation roles that categorizes how social media posts are used within content. We release a dataset of 38M social quotes derived from the Common Crawl, and role labels for a subset assessed by human raters. We show that the interplay of accounts, roles, and topics across the web graph reveal valuable social diffusion patterns, and that roles can be predicted with fine-tuned large language models from web context. View details
    Exploring the Feasibility of Remote Cardiac Auscultation Using Earphones
    Tao Chen
    Yongjie Yang
    Xiuzhen Guo
    Jie Xiong
    Shangguan Longfei
    MobiCom 2024: The 30th Annual International Conference On Mobile Computing And Networking
    Preview abstract The elderly over 65 accounts for 80% of COVID deaths in the United States. In response to the pandemic, the federal, state governments, and commercial insurers are promoting video visits, through which the elderly can access specialists at home over the Internet, without the risk of COVID exposure. However, the current video visit practice barely relies on video observation and talking. The specialist could not assess the patient's health conditions by performing auscultations. This paper tries to address this key missing component in video visits by proposing Asclepius, a hardware-software solution that turns the patient's earphones into a stethoscope, allowing the specialist to hear the patient's fine-grained heart sound (i.e., PCG signals) in video visits. To achieve this goal, we contribute a low-cost plug-in peripheral that repurposes the earphone's speaker into a microphone and uses it to capture the patient's minute PCG signals from her ear canal. As the PCG signals suffer from strong attenuation and multi-path effects when propagating from the heart to ear canals, we then propose efficient signal processing algorithms coupled with a data-driven approach to de-reverberate and further correct the amplitude and frequency distortion in raw PCG receptions. We implement Asclepius on a 2-layer PCB board and follow the IRB protocol to evaluate its performance with 30 volunteers. Our extensive experiments show that Asclepius can effectively recover Phonocardiogram (PCG) signals with different types of earphones. The feedback from cardiologists also confirms the efficacy and efficiency of our system. PCG signal samples and benchmark results can be found at an anonymous link https://asclepius-system.github.io/ View details
    Preview abstract Neglected tropical diseases (NTDs) and infectious diseases disproportionately affect the poorest regions of the world. While large language models (LLMs) have shown promise for medical question answering, there is limited work focused on tropical and infectious disease-specific explorations. We introduce TRINDs, a dataset of 52 tropical and infectious diseases with demographic and semantic clinical and consumer augmentations. We evaluate various context and counterfactual locations to understand their influence on LLM performance. Results show that LLMs perform best when provided with contextual information such as demographics, location, and symptoms. We also develop TRINDs-LM, a tool that enables users to enter symptoms and contextual information to receive a most likely diagnosis. In addition to the LLM evaluations, we also conducted a human expert baseline study to assess the accuracy of human experts in diagnosing tropical and infectious diseases with 7 medical and public health experts. This work demonstrates methods for creating and evaluating datasets for testing and optimizing LLMs, and the use of a tool that could improve digital diagnosis and tracking of NTDs. View details
    Community search signatures as foundation features for human-centered geospatial modeling
    Chaitanya Kamath
    Mohit Agarwal
    David Schottlander
    Shailesh Bavadekar
    Niv Efron
    Shravya Shetty
    ICML 2024 Workshop on Data-Centric Machine Learning Research
    Preview abstract Aggregated relative search frequencies offer a unique composite signal reflecting people's habits, concerns, interests, intents, and general information needs, which are not found in other readily available datasets. Temporal search trends have been successfully used to perform nowcasting across a variety of domains such as infectious diseases, unemployment rates, and retail sales. However, most existing applications require curating specialized datasets of individual keywords, queries, or query clusters, and the search data need to be temporally aligned with the outcome variable of interest. We propose a novel approach for generating an aggregated and anonymized representation of search interest as foundation features at the community level for geospatial modeling. We benchmark these features using spatial datasets across multiple domains. In regions with a population greater than 3000 that cover over 95% of the contiguous US population, our models achieve an average R-squared score of 0.74 across 21 health variables, and 0.80 across 6 demographic and environmental variables. Our results demonstrate that these search features can be used for spatial predictions without strict temporal alignment, and that the resulting models outperform spatial interpolation and state of the art methods using satellite imagery features. View details
    Preview abstract End-to-end models for speech recognition and speech synthesis have many benefits, but we argue they also face a unique set of challenges not encountered in conventional multi-stage hybrid systems, which relied on the explicit injection of linguistic knowledge through resources such as phonemic dictionaries and verbalization grammars. These challenges include handling words with unusual grapheme-to-phoneme correspondences, converting between written forms like ‘12’ and spoken forms such as ‘twelve’, and contextual disambiguation of homophones or homographs. We describe the mitigation strategies that have been used for these problems in end-to-end systems, either implicitly or explicitly, and call out that the most commonly used mitigation techniques are likely incompatible with newly emerging approaches that use minimal amounts of supervised audio training data. We review best-of-both-world approaches that allow the use of end-to-end models combined with traditional linguistic resources, which we show are increasingly straightforward to create at scale, and close with an optimistic outlook for bringing speech technologies to many more languages by combining these strands of research. View details
    Design Principles and Challenges for Gaze + Pinch Interaction in XR
    Ken Pfeuffer
    Hans Gellersen
    IEEE Computer Graphics and Applications (2024)
    Preview abstract For Extended Reality (XR) headsets, a key aim is the natural interaction in 3D space beyond what traditional methods of keyboard, mouse, and touchscreen can offer. With the release of the Apple Vision Pro, a novel interaction paradigm is now widely available where users seamlessly navigate content through the combined use of their eyes and hands. However, blending these modalities poses unique design challenges due to their dynamic nature and the absence of established principles and standards. In this article, we present five design principles and issues for the Gaze + Pinch interaction technique, informed by eye-hand research in the human-computer interaction field. The design principles encompass mechanisms like division of labor and minimalistic timing, which are crucial for usability, alongside enhancements for the manipulation of objects, indirect interactions, and drag & drop. Whether in design, technology, or research domains, this exploration offers valuable perspectives for navigating the evolving landscape of 3D interaction. View details
    Quartic Quantum Speedups for Planted Inference Problems
    Alexander Schmidhuber
    Ryan O'Donnell
    arXiv:2406.19378 (2024)
    Preview abstract We describe a quantum algorithm for the Planted Noisy kXOR problem (also known as sparse Learning Parity with Noise) that achieves a nearly quartic (4th power) speedup over the best known classical algorithm while also only using logarithmically many qubits. Our work generalizes and simplifies prior work of Hastings, by building on his quantum algorithm for the Tensor Principal Component Analysis (PCA) problem. We achieve our quantum speedup using a general framework based on the Kikuchi Method (recovering the quartic speedup for Tensor PCA), and we anticipate it will yield similar speedups for further planted inference problems. These speedups rely on the fact that planted inference problems naturally instantiate the Guided Sparse Hamiltonian problem. Since the Planted Noisy kXOR problem has been used as a component of certain cryptographic constructions, our work suggests that some of these are susceptible to super-quadratic quantum attacks. View details
    Meta Lifelong-Learning With Selective and Task-Aware Adaptation
    Thanapapas Horsuwan
    Kasidis Kanwatchara
    Boonserm Kijsirikul
    Peerapon Vateekul
    IEEE Access, 12 (2024), pp. 34099-34115
    Preview abstract Meta-learning has been applied to lifelong language learning due to its ability to find an optimal model for efficient adaptation to any learned tasks. Generally, meta lifelong-learning partially stores samples from seen tasks in a memory and selects some of them to train the model, refresh the knowledge, and adapt the model for inference. However, the sample selection for these steps was usually done in a sub-optimal manner in existing work. Hence, we propose MeLSTA (Meta Lifelong-Learning with Selective and Task-Aware Adaptation) to effectively select the samples based on task identifiers and the adaptation scores reflecting the model behavior after adaptation. The results show that MeLSTA enhances the accuracy by 1.2% over the state-of-the-art while significantly shrinking the training duration by over 6 times. Additionally, our in-depth analysis reveals the strengths and limitations of MeLSTA and existing work, providing useful insights for future designs of meta lifelong-learning for NLP. View details
    Bridging the Gap: Unpacking the Hidden Challenges in Knowledge Distillation for Online Ranking Systems
    Shuo Yang
    Aniruddh Nath
    Yang Liu
    Li Wei
    Shawn Andrews
    Maciej Kula
    Jarrod Kahn
    Zhe Zhao
    Lichan Hong
    Preview abstract Knowledge Distillation (KD) is a powerful approach for compressing large models into smaller, more efficient models, particularly beneficial for latency-sensitive applications like recommender systems. However, current KD research predominantly focuses on Computer Vision (CV) and NLP tasks, overlooking unique data characteristics and challenges inherent to recommender systems. This paper addresses these overlooked challenges, specifically: (1) mitigating data distribution shifts between teacher and student models, (2) efficiently identifying optimal teacher configurations within time and budgetary constraints, and (3) enabling computationally efficient and rapid sharing of teacher labels to support multiple students. We present a robust KD system developed and rigorously evaluated on multiple large-scale personalized video recommendation systems within Google. Our live experiment results demonstrate significant improvements in student model performance while ensuring the consistent and reliable generation of high-quality teacher labels from continuous data streams. View details