Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10129 publications
Density-based User Representation through Gaussian Process Regression for Multi-interest Personalized Retrieval
Haolun Wu
Xue Liu
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS-24), Vancouver (2024)
Preview abstract
Personalized recommendation systems are increasingly essential in our information-rich society, aiding users in navigating the expansive online realm. However, accurately modeling the diverse and dynamic interests of the users remains a formidable challenge. Existing user modeling methods, like Single-point User Representation (SUR) and Multi-point User Representation (MUR), have their limitations in terms of accuracy, diversity, computation cost, and adaptability. To overcome these challenges, we introduce a novel model, the Density-based User Representation (DUR), leveraging Gaussian Process Regression (GPR), which has not been extensively explored in multi-interest recommendation and retrieval. Our approach inherently captures user interest dynamics without manual tuning, provides uncertainty-awareness, and is more efficient than point-based representation methods. This paper outlines the development and implementation of GPR4DUR, details its evaluation protocols, and presents extensive analysis demonstrating its effectiveness and efficiency. Experiments on real-world offline datasets confirm our method’s adaptability and efficiency. Further online experiments simulating user behavior illuminate the benefits of our method in the exploration-exploitation trade-off by effectively utilizing model uncertainty.
View details
HyperAttention: Large-scale Attention in Linear Time
Amin Karbasi
Amir Zandieh
Insu Han
David Woodruff
HyperAttention: Long-context Attention in Near-Linear Time (2024) (to appear)
Preview abstract
In this paper, we introduce a novel approximate attention mechanism dubbed ``HyperAttention``. Despite the rapidly increasing size and complexity of contexts used with Large Language Models (LLM), there is still a dire lack of computationally efficient attention mechanisms scaling better than the naive quadratic time. HyperAttention addresses this gap: it delivers provably linear time complexity with respect to the size of the context, while only incurring a negligible loss in downstream quality. Distinctively, it integrates the principles of Locality Sensitive Hashing (LSH), for efficient detection of heavy elements, along with uniform column sampling, allowing for a good approximation both of the heavy and light components of the attention matrix. HyperAttention provably approximates the attention layer in \textit{linear time}, making it the first practical linear time approximate attention mechanism. Crucially, HyperAttention has a highly-modular design, allowing seamless integration of other rapid low-level implementations, most notably FlashAttention. Empirical evaluations indicate that HyperAttention surpasses the existing methods, achieving orders of magnitude speed-up when compared to prevalent state-of-the-art solutions such as Flash Attention. This breakthrough presents significant implications for enabling the scalability of LLMs to significantly larger contexts.
View details
An artificial neural network to estimate the foliar and ground cover input variables of the Rangeland Hydrology and Erosion Model
Mahmoud Saeedimoghaddam
David Goodrich
Mariano Hernandez
David Phillip Guertin
Loretta J. Metz
Guillermo Ponce-Campos
Haiyan Wei
Shea Burns
Sarah E. McCord
Mark A. Nearing
C. Jason Williams
Carrie-Ann Houdeshell
Mashrekur Rahman
Menberu B. Meles
Steve Barker
Journal of Hydrology (2024)
Preview abstract
Models like the Rangeland Hydrology and Erosion Model (RHEM) are useful for estimating soil erosion, however, they rely on input parameters that are sometimes difficult or expensive to measure. Specifically, RHEM requires information about foliar and ground cover fractions that generally must be measured in situ, which makes it difficult to use models like RHEM to produce erosion or soil risk maps for areas exceeding the size of a hillslope such as a large watershed. We previously developed a deep learning emulator of RHEM that has low computational expense and can, in principle, be run over large areas (e.g., over the continental US). In this paper, we develop a deep learning model to estimate the RHEM ground cover inputs from remote sensing time series, reducing the need for extensive field surveys to produce erosion maps. We achieve a prediction accuracy on hillslope runoff of r2=0.9, and on soil loss and sediment yield of r2 = 0.4 at 66,643 field locations within the US. We demonstrate how this approach can be used for mapping by developing runoff, soil loss, and sediment yield maps over a 1356 km2 region of interest in Nebraska.
View details
ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters
Shiwei Liu
Guanchen Tao
Yifei Zou
Derek Chow
Zichen Fan
Kauna Lei
Bangfei Pan
Dennis Sylvester
Mehdi Saligane
Arxiv (2024)
Preview abstract
The self-attention mechanism sets transformer-based large language model (LLM) apart from the convolutional and recurrent neural networks. Despite the performance improvement, achieving real-time LLM inference on silicon is challenging due to the extensively used Softmax in self-attention. Apart from the non-linearity, the low arithmetic intensity greatly reduces the processing parallelism, which becomes the bottleneck especially when dealing with a longer context. To address this challenge, we propose Constant Softmax (ConSmax), a software-hardware co-design as an efficient Softmax alternative. ConSmax employs differentiable normalization parameters to remove the maximum searching and denominator summation in Softmax. It allows for massive parallelization while performing the critical tasks of Softmax. In addition, a scalable ConSmax hardware utilizing a bitwidth-split look-up table (LUT) can produce lossless non-linear operation and support mix-precision computing. It further facilitates efficient LLM inference. Experimental results show that ConSmax achieves a minuscule power consumption of 0.2 mW and area of 0.0008 mm^2 at 1250-MHz working frequency and 16-nm CMOS technology. Compared to state-of-the-art Softmax hardware, ConSmax results in 3.35x power and 2.75x area savings with a comparable accuracy on a GPT-2 model and the WikiText103 dataset.
View details
Preview abstract
Specialized Large multi-modal models (LMMs) have exhibited remarkable performance across numerous tasks, however, generalist LMMs suffer from performance degradation when training with a large collection of tasks. Recent research suggests Mixture of Experts (MoE) Models help instruction tuning, however, for LMMs of parameter size around O(50-100B), the prohibitive cost of replicating and storing the expert models severely limits the number of experts we can use.
We propose Omni-SMoLA that softly mixes many multimodal low rank experts to large models without introducing significant new parameter count compared to conventional MoE models. The core idea is that the large model provides a foundational backbone and different lightweight experts learn specialized knowledge residually. Extensive experiments demonstrate that the SMoLA approach helps improve the generalist performance across a broad range of visual question answering and captioning tasks, achieving a new state-of-the-art generalist performance that matches or outperforms single specialized LMM baselines.
View details
Preview abstract
We propose a neural network model that can separate target speech sources from interfering sources at different angular regions using two microphones. The model is trained with simulated room impulse responses (RIRs) using omni-directional microphones without needing to collect real RIRs. By relying on specific angular regions and multiple room simulations, the model utilizes consistent time difference of arrival (TDOA) cues, or what we call delay contrast, to separate target and interference sources while remaining robust in various reverberation environments. We demonstrate the model is not only generalizable to a commercially available device with a slightly different microphone geometry, but also outperforms our previous work which uses one additional microphone on the same device. The model runs in real-time on-device and is suitable for low-latency streaming applications such as telephony and video conferencing.
View details
Preview abstract
We're roughly 10 years into the OpenConfig journey. We have implementations in hand from various vendors, and we've gained significant operational experience in the domains of Streaming Telemetry and in Developing Configuration Systems to leverage the developed models. What have we learned? Are the abstractions we've generated the right ones? If not, why? Were we too influenced by the tools and inertia of the time when we made some critical decisions? How do we need to evolve going forward? This discussion is part retrospective/introspective, a candid look at where we've been and what we need to think about as we evolve the next generation of our management (and control) planes. What should we be thinking about as network engineers who write software?
View details
Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation
Susan Lin
Jeremy Warner
J.D. Zamfirescu-Pereira
Matthew G Lee
Sauhard Jain
Michael Xuelin Huang
Bjoern Hartmann
Can Liu
Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, New York, NY, USA
Preview abstract
Dictation enables efficient text input on mobile devices. However, writing with speech can produce disfluent, wordy, and incoherent text and thus requires heavy post-processing. This paper presents Rambler, an LLM-powered graphical user interface that supports gist-level manipulation of dictated text with two main sets of functions: gist extraction and macro revision. Gist extraction generates keywords and summaries as anchors to support the review and interaction with spoken text. LLM-assisted macro revisions allow users to respeak, split, merge, and transform dictated text without specifying precise editing locations. Together they pave the way for interactive dictation and revision that help close gaps between spontaneously spoken words and well-structured writing. In a comparative study with 12 participants performing verbal composition tasks, Rambler outperformed the baseline of a speech-to-text editor + ChatGPT, as it better facilitates iterative revisions with enhanced user control over the content while supporting surprisingly diverse user strategies.
View details
Preview abstract
Use of Text-to-Image models is expanding beyond generating generic objects, as they are increasingly being adopted by diverse global communities to create visual representations of their unique culture. Current T2I benchmarks primarily evaluate image-text alignment, aesthetics and fidelity of generations for complex prompts with generic objects, overlooking the critical dimension of cultural understanding. In this work, we address this gap by defining a framework to evaluate cultural competence of T2I models, and present a scalable approach to collect cultural artifacts unique to a particular culture from Knowledge Graphs and Large Language Models in tandem. We assess the ability of state-of-the-art T2I models to generate culturally faithful and realistic images across 8 countries and 3 cultural domains. Furthermore, we emphasize the importance of T2I models reflecting a culture's diversity and introduce cultural diversity as a novel metric for T2I evaluation, drawing inspiration from the Vendi Score. We introduce T2I-GCube, a first-of-its-kind benchmark for T2I evaluation. T2I-GCube includes cultural prompts, metrics, and cultural concept spaces, enabling comprehensive assessment of T2I models' cultural knowledge and diversity. Our evaluations reveal significant gaps in the cultural knowledge of existing models and provide valuable insights into the diversity of image outputs for under-specified prompts. By introducing a novel approach to evaluating cultural diversity and knowledge in T2I models, T2I-GCube will be instrumental in fostering the development of models with enhanced cultural competence.
View details
Factual and Personalized Recommendation Language Modeling with Reinforcement Learning
Jihwan Jeong
Mohammad Ghavamzadeh
Proceedings of the First Conference on Language Modeling (COLM-24), Philadelphia (2024)
Preview abstract
Recommender systems (RSs) play a central role in connecting users to products, content and services by matching candidate items to users based on their preferences. While existing RSs often rely on implicit user feedback on recommended items (e.g., clicks, watches, ratings), conversational recommender systems are interacting with users to provide tailored recommendations in natural language. In this work, we aim to develop a recommender language model (LM) that is capable of generating compelling endorsement presentations of relevant items to users, to better explain the details of the items, to connect the items with users’ preferences, and to enhance the likelihood of users accepting recommendations. Specifically, such an LLM-based recommender can understand users’ preferences from users’ RS embeddings summarizing feedback history, output corresponding responses that not only are factually-grounded, but also explain whether these items satisfy users’ preferences in a convincing manner. The pivotal question is how one can gauge the performance of such a LLM recommender. Equipped with a joint reward function that measures factual consistency, convincingness, and personalization, not only can we evaluate the efficacies of different recommender LMs, but we can also utilize this metric as a form of AI feedback to fine-tune our LLM agent via reinforcement learning (RL). Building upon the MovieLens movie recommendation benchmark, we developed a novel conversational recommender delivering personalized movie narratives to users. This work lays the groundwork for recommendation systems that prioritize individualized user experiences without compromising on transparency and integrity.
View details
Improved Communication-Privacy Trade-offs in L2 Mean Estimation under Streaming Differential Privacy
Wei-Ning Chen
Albert No
Sewoong Oh
International Conference on Machine Learning (ICML) (2024)
Preview abstract
We study $L_2$ mean estimation under central differential privacy and communication constraints, and address two key challenges: firstly, existing mean estimation schemes that simultaneously handle both constraints are usually optimized for $L_\infty$ geometry and rely on random rotation or Kashin's representation to adapt to $L_2$ geometry, resulting in suboptimal leading constants in mean square errors (MSEs); secondly, schemes achieving order-optimal communication-privacy trade-offs do not extend seamlessly to streaming differential privacy (DP) settings (e.g., tree aggregation or matrix factorization), rendering them incompatible with DP-FTRL type optimizers.
In this work, we tackle these issues by introducing a novel privacy accounting method for the sparsified Gaussian mechanism that incorporates the randomness inherent in sparsification into the DP noise. Unlike previous approaches, our accounting algorithm directly operates in $L_2$ geometry, yielding MSEs that fast converge to those of the uncompressed Gaussian mechanism. Additionally, we extend the sparsification scheme to the matrix factorization framework under streaming DP and provide a precise accountant tailored for DP-FTRL type optimizers. Empirically, our method demonstrates at least a 100x improvement of compression for DP-SGD across various FL tasks.
View details
Preview abstract
Misgendering is the act of referring to someone in way that does not reflect their gender identity. Translation systems, including foundation models capable of translation, can produce errors that result in misgendering harms. To measure the extent of such potential harms when translating into and out of English, we introduce a dataset, MiTTenS, covering 26 languages. The dataset is constructed with handcrafted passages that target known failure patterns, longer synthetically generated passages, and natural passages sourced from multiple domains. We demonstrate the usefulness of the dataset by evaluating both dedicated neural machine translation systems and foundation models, and show that all systems exhibit errors resulting in misgendering harms, even in high resource languages.
View details
Scalable Learning of Segment-Level Traffic Congestion Functions
Shushman Choudhury
Aboudy Kreidieh
Alexandre Bayen
IEEE Intelligent Transportation Systems Conference (2024)
Preview abstract
We propose and study a data-driven framework for identifying traffic congestion functions (numerical relationships between observations of traffic variables) at global scale and segment-level granularity. In contrast to methods that estimate a separate set of parameters for each roadway, ours learns a single black-box function over all roadways in a metropolitan area. First, we pool traffic data from all segments into one dataset, combining static attributes with dynamic time-dependent features. Second, we train a feed-forward neural network on this dataset, which we can then use on any segment in the area. We evaluate how well our framework identifies congestion functions on observed segments and how it generalizes to unobserved segments and predicts segment attributes on a large dataset covering multiple cities worldwide. For identification error on observed segments, our single data-driven congestion function compares favorably to segment-specific model-based functions on highway roads, but has room to improve on arterial roads. For generalization, our approach shows strong performance across cities and road types: both on unobserved segments in the same city and on zero-shot transfer learning between cities. Finally, for predicting segment attributes, we find that our approach can approximate critical densities for individual segments using their static properties.
View details
TRINDs: Assessing the Diagnostic Capabilities of Large Language Models for Tropical and Infectious Diseases
Steve Adudans
Oluwatosin Akande
Chintan Ghate
Sylvanus Aitkins
Geoffrey Siwo
Lynda Osadebe
Nenad Tomašev
Eric Ndombi
Preview abstract
Neglected tropical diseases (NTDs) and infectious diseases disproportionately affect the poorest regions of the world. While large language models (LLMs) have shown promise for medical question answering, there is limited work focused on tropical and infectious disease-specific explorations. We introduce TRINDs, a dataset of 52 tropical and infectious diseases with demographic and semantic clinical and consumer augmentations. We evaluate various context and counterfactual locations to understand their influence on LLM performance. Results show that LLMs perform best when provided with contextual information such as demographics, location, and symptoms. We also develop TRINDs-LM, a tool that enables users to enter symptoms and contextual information to receive a most likely diagnosis. In addition to the LLM evaluations, we also conducted a human expert baseline study to assess the accuracy of human experts in diagnosing tropical and infectious diseases with 7 medical and public health experts. This work demonstrates methods for creating and evaluating datasets for testing and optimizing LLMs, and the use of a tool that could improve digital diagnosis and tracking of NTDs.
View details
KATch: A Fast Symbolic Verifier for NetKAT
Mark Moeller
Jules Jacobs
Olivier Savary Belanger
David Darais
Cole Schlesinger
Nate Foster
Alexandra Silva
Programming Languages and Implementation (PLDI) (2024) (to appear)
Preview abstract
We develop new data structures and algorithms for checking verification queries in NetKAT, a domain-specific language for specifying the behavior of network data planes. Our results extend the techniques obtained in prior work on symbolic automata and provide a framework for building efficient and scalable verification tools. We present \KATch, an implementation of these ideas in Scala, including extended logical operators that are useful for expressing network-wide specifications and optimizations that construct a bisimulation quickly or generate a counter-example showing that none exists. We evaluate the performance of our implementation on real-world and synthetic benchmarks, verifying properties such as reachability and slice isolation, typically returning a result in well under a second, which is orders of magnitude faster than previous approaches.
View details