Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10069 publications
Large Language Models as a Proxy For Human Evaluation in Assessing the Comprehensibility of Disordered Speech Transcription
Richard Cave
Katie Seaver
Jordan Green
Rus Heywood
Proceedings of ICASSP, IEEE (2024)
Preview abstract
Automatic Speech Recognition (ASR) systems, despite significant advances in recent years, still have much room for improvement particularly in the recognition of disordered speech. Even so, erroneous transcripts from ASR models can help people with disordered speech be better understood, especially if the transcription doesn’t significantly change the intended meaning. Evaluating the efficacy of ASR for this use case requires a methodology for measuring the impact of transcription errors on the intended meaning and comprehensibility. Human evaluation is the gold standard for this, but it can be laborious, slow, and expensive. In this work, we tune and evaluate large language models for this task and find them to be a much better proxy for human evaluators than other metrics commonly used. We further present a case-study using the presented approach to assess the quality of personalized ASR models to make model deployment decisions and correctly set user expectations for model quality as part of our trusted tester program.
View details
Preview abstract
Current approaches to Analog Layout Automation
apply ML techniques such as Graph Convolutional Neural
Networks (GCN) to translate netlist to layout. While these ML
approaches have proven to be effective, they lack the powerful
reasoning capabilities, an intuitive human interface, and standard
evaluation benchmarks that have been improving at a rapid de-
velopment pace in Large Language Models (LLMs). The GLayout
framework introduced in this work translates analog layout into
an expressive, technology generic, compact text representation.
Then, an LLM is taught to understand analog layout through
fine-tuning and in-context learning using Retrieval Augmented
Generation (RAG). The LLM is able to successfully layout unseen
circuits based on new information provided in-context. We train
3.8, 7, and 22 Billion parameter quantized LLMs on a dataset
of less than 50 unique circuits, and text documents providing
layout knowledge. The 22B parameter model is tuned in 2 hours
on a single NVIDIA A100 GPU. The open-source evaluation
set is proposed as an automation benchmark for LLM layout
automation tasks, and ranges from 2-transistor circuits to a
∆Σ ADC. The 22B model completes 70% of the tasks in the
evaluation set, and is able to pass DRC and LVS verification on
unseen 4 transistor blocks.
View details
Websites Need Your Permission Too – User Sentiment and Decision Making on Web Permission Prompts in Desktop Chrome
Marian Harbach
CHI 2024, ACM (to appear)
Preview abstract
The web utilizes permission prompts to moderate access to certain capabilities. We present the first investigation of user behavior and sentiment of this security and privacy measure on the web, using 28 days of telemetry data from more than 100M Chrome installations on desktop platforms and experience sampling responses from 25,706 Chrome users. Based on this data, we find that ignoring and dismissing permission prompts are most common for geolocation and notifications. Permission prompts are perceived as more annoying and interrupting when they are not allowed, and most respondents cite a rational reason for the decision they took. Our data also supports that the perceived availability of contextual information from the requesting website is associated with allowing access to a requested capability. More usable permission controls could facilitate adoption of best practices that address several of the identified challenges; and ultimately could lead to better user experiences and a safer web.
View details
Building Recommendation Systems using Lambda Architecture
Vipul Bharat Marlecha
Sreyashi Das
International Research Journal of Engineering and Technology (IRJET), Volume: 11 Issue: 05 | May 2024 (2024)
Preview abstract
This paper studies the recommendation systems that are typical to content discovery and personalized services like Netflix and Amazon. The study includes typical components of recommendation systems, what data and inputs are required to serve depending on the machine learning models used. We share how the recommendations leverage a mix of batch processing and streaming databases, and end with trends and potential future developments for recommendation systems
View details
ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters
Shiwei Liu
Guanchen Tao
Yifei Zou
Derek Chow
Zichen Fan
Kauna Lei
Bangfei Pan
Dennis Sylvester
Mehdi Saligane
Arxiv (2024)
Preview abstract
The self-attention mechanism sets transformer-based large language model (LLM) apart from the convolutional and recurrent neural networks. Despite the performance improvement, achieving real-time LLM inference on silicon is challenging due to the extensively used Softmax in self-attention. Apart from the non-linearity, the low arithmetic intensity greatly reduces the processing parallelism, which becomes the bottleneck especially when dealing with a longer context. To address this challenge, we propose Constant Softmax (ConSmax), a software-hardware co-design as an efficient Softmax alternative. ConSmax employs differentiable normalization parameters to remove the maximum searching and denominator summation in Softmax. It allows for massive parallelization while performing the critical tasks of Softmax. In addition, a scalable ConSmax hardware utilizing a bitwidth-split look-up table (LUT) can produce lossless non-linear operation and support mix-precision computing. It further facilitates efficient LLM inference. Experimental results show that ConSmax achieves a minuscule power consumption of 0.2 mW and area of 0.0008 mm^2 at 1250-MHz working frequency and 16-nm CMOS technology. Compared to state-of-the-art Softmax hardware, ConSmax results in 3.35x power and 2.75x area savings with a comparable accuracy on a GPT-2 model and the WikiText103 dataset.
View details
Analyzing Prospects for Quantum Advantage in Topological Data Analysis
Dominic W. Berry
Yuan Su
Casper Gyurik
Robbie King
Joao Basso
Abhishek Rajput
Nathan Wiebe
Vedran Djunko
PRX Quantum, 5 (2024), pp. 010319
Preview abstract
Lloyd et al. were first to demonstrate the promise of quantum algorithms for computing Betti numbers in persistent homology (a way of characterizing topological features of data sets). Here, we propose, analyze, and optimize an improved quantum algorithm for topological data analysis (TDA) with reduced scaling, including a method for preparing Dicke states based on inequality testing, a more efficient amplitude estimation algorithm using Kaiser windows, and an optimal implementation of eigenvalue projectors based on Chebyshev polynomials. We compile our approach to a fault-tolerant gate set and estimate constant factors in the Toffoli complexity. Our analysis reveals that super-quadratic quantum speedups are only possible for this problem when targeting a multiplicative error approximation and the Betti number grows asymptotically. Further, we propose a dequantization of the quantum TDA algorithm that shows that having exponentially large dimension and Betti number are necessary, but insufficient conditions, for super-polynomial advantage. We then introduce and analyze specific problem examples for which super-polynomial advantages may be achieved, and argue that quantum circuits with tens of billions of Toffoli gates can solve some seemingly classically intractable instances.
View details
Individual Welfare Guarantees in the Autobidding World with Machine-learned Advice
Negin Golrezaei
Patrick Jaillet
Jason Cheuk Nam Liang
Proceedings of the ACM on Web Conference 2024, 267–275
Preview abstract
Online advertising channels commonly focus on maximizing total advertiser welfare to enhance channel health, and previous literature has studied augmenting ad auctions with machine learning predictions on advertiser values (also known asmachine-learned advice ) to improve total welfare. Yet, such improvements could come at the cost of individual bidders' welfare and do not shed light on how particular advertiser bidding strategies impact welfare. Motivated by this, we present an analysis on an individual bidder's welfare loss in the autobidding world for auctions with and without machine-learned advice, and also uncover how advertiser strategies relate to such losses. In particular, we demonstrate how ad platforms can utilize ML advice to improve welfare guarantee on the aggregate and individual bidder level by setting ML advice as personalized reserve prices when the platform consists ofautobidders who maximize value while respecting a return on ad spend (ROAS) constraint. Under parallel VCG auctions with such ML advice-based reserves, we present a worst-case welfare lower-bound guarantee for an individual autobidder, and show that the lower-bound guarantee is positively correlated with ML advice quality as well as the scale of bids induced by the autobidder's bidding strategies. Further, we show that no truthful, and possibly randomized mechanism with anonymous allocations can achieve universally better individual welfare guarantees than VCG, in the presence of personalized reserves based on ML-advice of equal quality. Moreover, we extend our individual welfare guarantee results to generalized first price (GFP) and generalized second price (GSP) auctions. Finally, we present numerical studies using semi-synthetic data derived from ad auction logs of a search ad platform to showcase improvements in individual welfare when setting personalized reserve prices with ML-advice.
View details
Preview abstract
We propose OmniNOCS, a large-scale monocular dataset with 3D Normalized Object Coordinate Space (NOCS) maps, object masks, and 3D bounding box annotations for indoor and outdoor scenes. OmniNOCS has 20 times more object classes and 200 times more instances than existing NOCS datasets (NOCS-Real275, Wild6D). We use OmniNOCS to train a novel, transformer-based monocular NOCS prediction model (NOCSformer) that can predict accurate NOCS, instance masks and poses from 2D object detections across diverse classes. It is the first NOCS model that can generalize to a broad range of classes when prompted with 2D boxes. We evaluate our model on the task of 3D oriented bounding box prediction, where it achieves comparable results to state-of-the-art 3D detection methods such as Cube R-CNN. Unlike other 3D detection methods, our model also provides detailed and accurate 3D object shape and segmentation. We propose a novel benchmark for the task of NOCS prediction based on OmniNOCS, which we hope will serve as a useful baseline for future work in this area. Our dataset and code is available at the project website: https://omninocs.github.io
View details
Reinforcement Learning-Enhanced Cloud-Based Open Source Analog Circuit Generator for Standard and Cryogenic Temperatures in 130-nm and 180-nm OpenPDKs
Ali Hammoud
Anhang Li
Ayushman Tripathi
Wen Tian
Harsh Khandeparkar
Ryan Wans
Boris Murmann
Dennis Sylvester
Mehdi Saligane
Preview abstract
This work introduces an open-source, Process Technology-agnostic framework for hierarchical circuit netlist, layout, and Reinforcement Learning (RL) optimization. The layout, netlist, and optimization python API is fully modular and publicly installable via PyPI. It features a bottom-up hierarchical construction, which allows for complete design reuse across provided PDKs. The modular hierarchy also facilitates parallel circuit design iterations on cloud platforms. To illustrate its capabilities, a two-stage OpAmp with a 5T first-stage, commonsource second-stage, and miller compensation is implemented. We instantiate the OpAmp in two different open-source process design kits (OpenPDKs) using both room-temperature models and cryogenic (4K) models. With a human designed version as the baseline, we leveraged the parameterization capabilities of the framework and applied the RL optimizer to adapt to the power consumption limits suitable for cryogenic applications while maintaining gain and bandwidth performance. Using the modular RL optimization framework we achieve a 6x reduction in power consumption compared to manually designed circuits while maintaining gain to within 2%.
View details
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) (2024)
Preview abstract
Ranking documents using Large Language Models (LLMs) by directly feeding the query and candidate documents into the prompt is an interesting and practical problem. However, researchers have found it difficult to outperform fine-tuned baseline rankers on benchmark datasets. We analyze pointwise and listwise ranking prompts used by existing methods and argue that off-the-shelf LLMs do not fully understand these challenging ranking formulations. In this paper, we propose to significantly reduce the burden on LLMs by using a new technique called Pairwise Ranking Prompting (PRP). Our results are the first in the literature to achieve state-of-the-art ranking performance on standard benchmarks using moderate-sized open-sourced LLMs. On TREC-DL 2019&2020, PRP based on the Flan-UL2 model with 20B parameters performs favorably with the previous best approach in the literature, which is based on the blackbox commercial GPT-4 that has 50x (estimated) model size, while outperforming other LLM-based solutions, such as InstructGPT which has 175B parameters, by over 10% for all ranking metrics. By using the same prompt template on seven BEIR tasks, PRP outperforms supervised baselines and outperforms the blackbox commercial ChatGPT solution by 4.2% and pointwise LLM-based solutions by more than 10% on average NDCG@10. Furthermore, we propose several variants of PRP to improve efficiency and show that it is possible to achieve competitive results even with linear complexity.
View details
Characterizing a Memory Allocator at Warehouse Scale
Zhuangzhuang Zhou
Nilay Vaish
Patrick Xia
Christina Delimitrou
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, Association for Computing Machinery, La Jolla, CA, USA (2024), 192–206
Preview abstract
Memory allocation constitutes a substantial component of warehouse-scale computation. Optimizing the memory allocator not only reduces the datacenter tax, but also improves application performance, leading to significant cost savings.
We present the first comprehensive characterization study of TCMalloc, a warehouse-scale memory allocator used in our production fleet. Our characterization reveals a profound diversity in the memory allocation patterns, allocated object sizes and lifetimes, for large-scale datacenter workloads, as well as in their performance on heterogeneous hardware platforms. Based on these insights, we redesign TCMalloc for warehouse-scale environments. Specifically, we propose optimizations for each level of its cache hierarchy that include usage-based dynamic sizing of allocator caches, leveraging hardware topology to mitigate inter-core communication overhead, and improving allocation packing algorithms based on statistical data. We evaluate these design choices using benchmarks and fleet-wide A/B experiments in our production fleet, resulting in a 1.4% improvement in throughput and a 3.4% reduction in RAM usage for the entire fleet. At our scale, even a single percent CPU or memory improvement translates to significant savings in server costs.
View details
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Dan Kondratyuk
Xiuye Gu
Jonathan Huang
Grant Schindler
Rachel Hornung
Vighnesh Birodkar
Jimmy Yan
Ming-Chang Chiu
Hassan Akbari
Josh Dillon
Agrim Gupta
Meera Hahn
Anja Hauth
David Hendon
Alonso Martinez
Kihyuk Sohn
Xuan Yang
Huisheng Wang
Lu Jiang
ICML (2024)
Preview abstract
We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs -- including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and task-specific adaptation. During pretraining, VideoPoet incorporates a mixture of multimodal generative objectives within an autoregressive Transformer framework. The pretrained LLM serves as a foundation that can be adapted for a range of video generation tasks. We present empirical results demonstrating the model's state-of-the-art capabilities in zero-shot video generation, specifically highlighting VideoPoet's ability to generate high-fidelity motions. Project page: http://sites.research.google/videopoet/
View details
Preview abstract
Middle-mile logistics describes the problem of routing shipments through a network of hubs while respecting deadlines upon arrival. We consider that the hubs are linked by predefined lines, to which we have to assign vehicles. A very challenging aspect of the problem comes from the finite capacity of the vehicles: allocating a shipment to a given vehicle might block another one from using the same vehicle.
Typical exact solution methods, based on a multicommodity-flow formulation, scale poorly with the problem size and real-world instances become quickly intractable. Instead, we turn to reinforcement learning (RL) by rephrasing the middle-mile problem as a multi-objective Markov decision process, where the state is a graph: the lines (edges) between the hubs and the parcels (nodes). At each round, we assign one shipment to a vehicle or decide that it stays in the same hub. The key ingredients of our proposed method are the extraction of small feature graphs from the state and the combination of graph neural networks (GNN) with model-free RL.
We use the PPO (proximal policy optimization) algorithm, which maintains both an actor and a critic, while being able to cope with a varying number of actions depending on the state. We compare linear functions and GraphNet (a particular kind of GNN) to approximate the policy and value functions. GNNs can deliver up to 40% more shipments than a linear function and both approaches scale well with the number of shipments per truck.
View details
Preview abstract
This paper presents a Multifunctional wearable
sensing system that integrates flexible Laser-Induced-Graphene
(LIG) based sensors and an Open-Source Analog Front-End
(AFE) chip. The LIG sensors are fabricated on polyimide (PI)
Flexible Printed Circuit Board (FPCB) through CO2 infrared
laser direct-write method. The LIG sensors provide repeatable
high-precision temperature sensing, humidity measurement, and
strain detection capabilities. The temperature sensing charac-
terization shows the resistive LIG sensor has a sensitivity of
-0.0493 %/°C, the linear fit R-square factors ≥ 0.9973 across -40
°C to 125 °C. The capacitive humidity sensor achieves a 23.6
times capacitance at 95% relative humidity (RH) compared to
the value observed in a dry environment. Our proposed AFE
chip contains a hybrid folded-cascode Operational Amplifier
(OPAMP) and a Successive Approximation Register Analog-
to-Digital Converter (SAR ADC). Designed using open-source
analog flow and fabricated in GF180 OpenPDK, the AFE chip
serves as a flexible and universal readout platform, adaptable for
various sensing applications. A real-time demonstration of finger
bending detection is performed to validate the functionality.
The multifunctional sensing capability provide by the wearable
system is attractive for personal healthcare application. This
work underscores the integration of the LIG sensors and the
AFE chip, developed using open-source tools which facilitate
rapid and affordable prototyping for a multifunctional flexible
wearable sensing system.
View details
Storage Systems For Real-Time Personalized Recommendations
Jayasekhar Konduru
Aqsa Fulara
DZone (2024)
Preview abstract
This article explores the demands of real-time personalized recommendation systems, focusing on data storage challenges and solutions. We'll present common storage solutions suitable for such systems and outline best practices.
View details