Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10092 publications
Preview abstract
Current approaches to Analog Layout Automation
apply ML techniques such as Graph Convolutional Neural
Networks (GCN) to translate netlist to layout. While these ML
approaches have proven to be effective, they lack the powerful
reasoning capabilities, an intuitive human interface, and standard
evaluation benchmarks that have been improving at a rapid de-
velopment pace in Large Language Models (LLMs). The GLayout
framework introduced in this work translates analog layout into
an expressive, technology generic, compact text representation.
Then, an LLM is taught to understand analog layout through
fine-tuning and in-context learning using Retrieval Augmented
Generation (RAG). The LLM is able to successfully layout unseen
circuits based on new information provided in-context. We train
3.8, 7, and 22 Billion parameter quantized LLMs on a dataset
of less than 50 unique circuits, and text documents providing
layout knowledge. The 22B parameter model is tuned in 2 hours
on a single NVIDIA A100 GPU. The open-source evaluation
set is proposed as an automation benchmark for LLM layout
automation tasks, and ranges from 2-transistor circuits to a
∆Σ ADC. The 22B model completes 70% of the tasks in the
evaluation set, and is able to pass DRC and LVS verification on
unseen 4 transistor blocks.
View details
Preview abstract
We're roughly 10 years into the OpenConfig journey. We have implementations in hand from various vendors, and we've gained significant operational experience in the domains of Streaming Telemetry and in Developing Configuration Systems to leverage the developed models. What have we learned? Are the abstractions we've generated the right ones? If not, why? Were we too influenced by the tools and inertia of the time when we made some critical decisions? How do we need to evolve going forward? This discussion is part retrospective/introspective, a candid look at where we've been and what we need to think about as we evolve the next generation of our management (and control) planes. What should we be thinking about as network engineers who write software?
View details
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Gilles Baechler
Srinivas Sunkara
Maria Wang
Hassan Mansoor
Vincent Etter
Jason Lin
(2024)
Preview abstract
Screen user interfaces (UIs) and infographics, sharing similar visual language and design principles, play important roles in human communication and human-machine interaction. We introduce ScreenAI, a vision-language model that specializes in UI and infographics understanding. Our model improves upon the PaLI architecture with the flexible patching strategy of pix2struct and is trained on a unique mixture of datasets. At the heart of this mixture is a novel screen annotation task in which the model has to identify the type and location of UI elements. We use these text annotations to describe screens to Large Language Models and automatically generate question-answering (QA), UI navigation, and summarization training datasets at scale. We run ablation studies to demonstrate the impact of these design choices. At only 5B parameters, ScreenAI achieves new state-of-the-artresults on UI- and infographics-based tasks (Multi-page DocVQA, WebSRC, MoTIF and Widget Captioning), and new best-in-class performance on others (Chart QA, DocVQA, and InfographicVQA) compared to models of similar size. Finally, we release three new datasets: one focused on the screen annotation task and two others focused on question answering.
View details
Wear's my Data? Understanding the Cross-Device Runtime Permission Model in Wearables
Doguhan Yeke
Muhammad Ibrahim
Habiba Farukh
Abdullah Imran
Antonio Bianchi
Z. Berkay Celik
IEEE Security and Privacy (2024) (to appear)
Preview abstract
Wearable devices are becoming increasingly important, helping us stay healthy and connected. There are a variety
of app-based wearable platforms that can be used to manage
these devices. The apps on wearable devices often work with a
companion app on users’ smartphones. The wearable device and
the smartphone typically use two separate permission models
that work synchronously to protect sensitive data. However, this
design creates an opaque view of the management of permission-
protected data, resulting in over-privileged data access without
the user’s explicit consent. In this paper, we performed the first
systematic analysis of the interaction between the Android and
Wear OS permission models. Our analysis is two-fold. First,
through taint analysis, we showed that cross-device flows of
permission-protected data happen in the wild, demonstrating
that 28 apps (out of the 150 we studied) on Google Play
have sensitive data flows between the wearable app and its
companion app. We found that these data flows occur without
the users’ explicit consent, introducing the risk of violating
user expectations. Second, we conducted an in-lab user study
to assess users’ understanding of permissions when subject to
cross-device communication (n = 63). We found that 66.7% of
the users are unaware of the possibility of cross-device sensitive
data flows, which impairs their understanding of permissions in
the context of wearable devices and puts their sensitive data at
risk. We also showed that users are vulnerable to a new class of
attacks that we call cross-device permission phishing attacks on
wearable devices. Lastly, we performed a preliminary study on
other watch platforms (i.e., Apple’s watchOS, Fitbit, Garmin
OS) and found that all these platforms suffer from similar
privacy issues. As countermeasures for the potential privacy
violations in cross-device apps, we suggest improvements in the
system prompts and the permission model to enable users to
make better-informed decisions, as well as on app markets to
identify malicious cross-device data flows.
View details
Limoncello: Prefetchers for Scale
Carlos Villavieja
Baris Kasikci
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Association for Computing Machinery, New York, NY, United States (2024)
Preview abstract
This paper presents Limoncello, a novel software system that dynamically configures data prefetching for high utilization systems. We demonstrate that in resource-constrained environments, such as large data centers, traditional methods of hardware prefetching can increase memory latency and decrease available memory bandwidth. To address this, Limoncello dynamically configures data prefetching, disabling hardware prefetchers when memory bandwidth utilization is high and leveraging targeted software prefetching to reduce cache misses when hardware prefetchers are disabled. Limoncello is software-centric and does not require any modifications to hardware. Our evaluation of the deployment on a real-world hyperscale system reveals that Limoncello unlocks significant performance gains for high-utilization systems: it improves application throughput by 10%, due to a 15% reduction in memory latency, while maintaining minimal change in cache miss rate for targeted library functions.
View details
"We Need Structured Output": Towards User-centered Constraints on Large Language Model Output
Michael Xieyang Liu
Frederick Liu
Alex Fiannaca
Terry Koo
In Extended Abstract in ACM CHI Conference on Human Factors in Computing Systems (CHI EA '24), ACM (2024), pp. 9 (to appear)
Preview abstract
Large language models can produce creative and diverse responses. However, to integrate them into current developer workflows, it is essential to constrain their outputs to follow specific formats or standards. In this work, we surveyed 51 experienced industry professionals to understand the range of scenarios and motivations driving the need for output constraints from a user-centered perspective. We identified 134 concrete use cases for constraints at two levels: low-level, which ensures the output adhere to a structured format and an appropriate length, and high-level, which requires the output to follow semantic and stylistic guidelines without hallucination. Critically, applying output constraints could not only streamline the currently repetitive process of developing, testing, and integrating LLM prompts for developers, but also enhance the user experience of LLM-powered features and applications. We conclude with a discussion on user preferences and needs towards articulating intended constraints for LLMs, alongside an initial design for a constraint prototyping tool.
View details
(In)Security of File Uploads in Node.js
Harun Oz
Abbas Acar
Ahmet Aris
Amin Kharraz
Selcuk Uluagac
The Web conference (WWW) (2024)
Preview abstract
File upload is a critical feature incorporated by a myriad of web
applications to enable users to share and manage their files conveniently. It has been used in many useful services such as file-sharing
and social media. While file upload is an essential component of
web applications, the lack of rigorous checks on the file name, type,
and content of the uploaded files can result in security issues, often
referred to as Unrestricted File Upload (UFU). In this study, we analyze the (in)security of popular file upload libraries and real-world
applications in the Node.js ecosystem. To automate our analysis, we
propose NodeSec– a tool designed to analyze file upload insecurities in Node.js applications and libraries. NodeSec generates unique
payloads and thoroughly evaluates the application’s file upload security against 13 distinct UFU-type attacks. Utilizing NodeSec, we
analyze the most popular file upload libraries and real-world ap-
plications in the Node.js ecosystem. Our results reveal that some
real-world web applications are vulnerable to UFU attacks and dis-
close serious security bugs in file upload libraries. As of this writing,
we received 19 CVEs and two US-CERT cases for the security issues that we reported. Our findings provide strong evidence that
the dynamic features of Node.js applications introduce security
shortcomings and that web developers should be cautious when
implementing file upload features in their applications.
View details
Preview abstract
The InterPlanetary File System (IPFS) is on its way to becoming the backbone of the next generation of the web. However, it suffers from several performance bottlenecks, particularly on the content retrieval path, which are often difficult to debug. This is because content retrieval involves multiple peers on the decentralized network and the issue could lie anywhere in the network. Traditional debugging tools are insufficient to help web developers who face the challenge of slow loading websites and detrimental user experience. This limits the adoption and future scalability of IPFS.
In this paper, we aim to gain valuable insights into how content retrieval requests propagate within the IPFS network as well as identify potential performance bottlenecks which could lead to opportunities for improvement. We propose a custom tracing framework that generates and manages traces for crucial events that take place on each peer during content retrieval. The framework leverages event semantics to build a timeline of each protocol involved in the retrieval, helping developers pinpoint problems. Additionally, it is resilient to malicious behaviors of the peers in the decentralized environment.
We have implemented this framework on top of an existing IPFS implementation written in Java called Nabu. Our evaluation shows that the framework can identify network delays and issues with each peer involved in content retrieval requests at a very low overhead.
View details
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Michael Xieyang Liu
Krystal Kallarackal
Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '24), ACM (2024)
Preview abstract
Automatic side-by-side evaluation has emerged as a promising approach to evaluating the quality of responses from large language models (LLMs). However, analyzing the results from this evaluation approach raises scalability and interpretability challenges. In this paper, we present LLM Comparator, a novel visual analytics tool for interactively analyzing results from automatic side-by-side evaluation. The tool supports interactive workflows for users to understand when and why a model performs better or worse than a baseline model, and how the responses from two models are qualitatively different. We iteratively designed and developed the tool by closely working with researchers and engineers at Google. This paper details the user challenges we identified, the design and development of the tool, and an observational study with participants who regularly evaluate their models.
View details
Preview abstract
Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses by accurately citing verifiable sources. However, existing methods, by either feeding LMs with raw or preprocessed materials, remain prone to errors. To address this, we introduce CaLM, a novel verification framework. CaLM leverages the insight that a robust grounded response should be consistent with information derived solely from its cited sources. Our framework empowers smaller LMs, which rely less on parametric memory and excel at processing relevant information given a query, to validate the output of larger LMs. Larger LM responses that closely align with the smaller LMs' output, which relies exclusively on cited documents, are verified. Responses showing discrepancies are iteratively refined through a feedback loop. Experiments on three open-domain question-answering datasets demonstrate significant performance gains of 1.5% to 7% absolute average without any required model fine-tuning.
View details
Automatic Histograms: Leveraging Language Models for Text Dataset Exploration
Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '24), ACM, Honolulu, HI, USA (2024), pp. 9
Preview abstract
Making sense of unstructured text datasets is perennially difficult, yet increasingly relevant with Large Language Models. Data practitioners often rely on dataset summaries, especially distributions of various derived features. Some features, like toxicity or topics, are relevant to many datasets, but many interesting features are domain specific, e.g., instruments and genres for a music dataset, or diseases and symptoms for a medical dataset. Accordingly, data practitioners often run custom analyses for each dataset, which is cumbersome and difficult, or use unsupervised methods. We present AutoHistograms, a visualization tool leveraging LLMs. AutoHistograms automatically identifies relevant entity-based features, visualizes their distributions, and allows the user to interactively query the dataset for new categories of entities. In a user study with (n=10) data practitioners, we observe that participants were able to quickly onboard to AutoHistograms, use the tool to identify actionable insights, and conceptualize a broad range of applicable use cases. We also describe a variety of usage scenarios from different types of users to highlight how this app can provide value in many different contexts. Finally, we present a quantitative evaluation of the tool. Together, this tool and user study contribute to the growing field of LLM-assisted sensemaking tools.
View details
Preview abstract
Simple, sufficient explanations
furnished by short decision lists
can be useful for guiding stakeholder actions.
Unfortunately, this transparency can come at the expense
of the higher accuracy enjoyed by black box methods,
like deep nets.
To date, practitioners typically either (i) insist on the simpler model, forsaking accuracy; or (ii) insist on maximizing accuracy, settling for post-hoc explanations of dubious faithfulness.
In this paper, we propose a hybrid \emph{partially interpretable model} that represents a compromise between the two extremes.
In our setup, each input is first processed by a decision list that can either execute a decision or abstain,
handing off authority to the opaque model.
The key to optimizing the decision list is to optimally
trade off the accuracy of the composite system
against coverage (the fraction of the population
that receives explanations).
We contribute a new principled algorithm for constructing partially interpretable decision lists,
providing theoretical guarantees
addressing both interpretability and accuracy.
As an instance of our result, we prove
that when the optimal decision list has length $k$, coverage $c$, and $b$ mistakes,
our algorithm will generate a decision list
that has length no greater than $4k$,
coverage at least $c/2$,
and makes at most $4b$ mistakes.
Finally, we empirically validate the effectiveness of the new model.
View details
USM-SCD: USM-Based Multilingual Speaker Change Detection
Yongqiang Wang
Jason Pelecanos
Yu Zhang
Yiling Huang
Han Lu
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 11801-11805
Preview abstract
We introduce a multilingual speaker change detection model (USM- SCD) that can simultaneously detect speaker turns and perform ASR for 96 languages. This model is adapted from a speech foundation model trained on a large quantity of supervised and unsupervised data, demonstrating the utility of fine-tuning from a large generic foundation model for a downstream task. We analyze the performance of this multilingual speaker change detection model through a series of ablation studies. We show that the USM-SCD model can achieve more than 75% average speaker change detection F1 score across a test set that consists of data from 96 languages. On American English, the USM-SCD model can achieve an 85.8% speaker change detection F1 score across various public and internal test sets, beating the previous monolingual baseline model by 21% relative. We also show that we only need to fine-tune one-quarter of the trainable model parameters to achieve the best model performance. The USM-SCD model exhibits state-of-the-art ASR quality compared with a strong public ASR baseline, making it suitable to handle both tasks with negligible additional computational cost.
View details
Comparative analysis of genAI features in Business Intelligence Platforms
Aqsa Fulara
International Journal of Computer Trends and Technology, Volume 72 Issue 4, 95-101, April 2024 (2024)
Preview abstract
The study is a comparative analysis of generative AI capabilities and their applications in BI plaforms. The rapid advancement here has opened new frontiers for data driven decision making and insights generation. However, integration in BI tools is largely unexplored in academia. The findings reveal significant variations in approach taken by different BI tools for similar genAI tasks.
View details
Non-uniform Bid-scaling and Equilibria for Different Auctions: An Empirical Study
Proceedings of the ACM on Web Conference 2024, 256–266
Preview abstract
In recent years, the growing adoption of autobidding has motivated the study of auction design with value-maximizing auto-bidders. It is known that under mild assumptions, uniform bid-scaling is an optimal bidding strategy in truthful auctions, e.g., Vickrey-Clarke-Groves auction (VCG), and the price of anarchy for VCG is 2. However, for other auction formats like First-Price Auction (FPA) and Generalized Second-Price auction (GSP), uniform bid-scaling may not be an optimal bidding strategy, and bidders have incentives to deviate to adopt strategies with non-uniform bid-scaling. Moreover, FPA can achieve optimal welfare if restricted to uniform bid-scaling, while its price of anarchy becomes 2 when non-uniform bid-scaling strategies are allowed.
All these price of anarchy results have been focused on welfare approximation in the worst-case scenarios. To complement theoretical understandings, we empirically study how different auction formats (FPA, GSP, VCG) with different levels of non-uniform bid-scaling perform in an autobidding world with a synthetic dataset for auctions. Our empirical findings include: * For both uniform bid-scaling and non-uniform bid-scaling, FPA is better than GSP and GSP is better than VCG in terms of both welfare and profit; * A higher level of non-uniform bid-scaling leads to lower welfare performance in both FPA and GSP, while different levels of non-uniform bid-scaling have no effect in VCG. Our methodology of synthetic data generation may be of independent interest.
View details