Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10132 publications
AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation
Yuanwen Yue
Sabarinath Mahadevan
Jonas Schult
Francis Engelmann
Bastian Leibe
Konrad Schindler
Theodora Kontogianni
ICLR (2024)
Preview abstract
During interactive segmentation, a model and a user work together to delineate objects of interest in a 3D point cloud. In an iterative process, the model assigns each data point to an object (or the background), while the user corrects errors in the resulting segmentation and feeds them back into the model. The current best practice formulates the problem as binary classification and segments objects one at a time. The model expects the user to provide positive clicks to indicate regions wrongly assigned to the background and negative clicks on regions wrongly assigned to the object. Sequentially visiting objects is wasteful since it disregards synergies between objects: a positive click for a given object can, by definition, serve as a negative click for nearby objects. Moreover, a direct competition between adjacent objects can speed up the identification of their common boundary. We introduce AGILE3D, an efficient, attention-based model that (1) supports simultaneous segmentation of multiple 3D objects, (2) yields more accurate segmentation masks with fewer user clicks, and (3) offers faster inference. Our core idea is to encode user clicks as spatial-temporal queries and enable explicit interactions between click queries as well as between them and the 3D scene through a click attention module. Every time new clicks are added, we only need to run a lightweight decoder that produces updated segmentation masks. In experiments with four different 3D point cloud datasets, AGILE3D sets a new state-of-the-art. Moreover, we also verify its practicality in real-world setups with real user studies. Project page: https://ywyue.github.io/AGILE3D.
View details
Searching for Dermatology Information Online using Images vs Text: a Randomized Study
Jay Hartford
Amit Talreja
Natalie Salaets
Kimberley Raiford
Jay Nayar
Dounia Berrada
Harsh Kharbanda
Lou Wang
Peggy Bui
medRxiv (2024)
Preview abstract
Background: Skin conditions are extremely common worldwide, and are an important cause of both anxiety and morbidity. Since the advent of the internet, individuals have used text-based search (eg, “red rash on arm”) to learn more about concerns on their skin, but this process is often hindered by the inability to accurately describe the lesion’s morphology. In the study, we surveyed respondents’ experiences with an image-based search, compared to the traditional text-based search experience.
Methods: An internet-based survey was conducted to evaluate the experience of text-based vs image-based search for skin conditions. We recruited respondents from an existing cohort of volunteers in a commercial survey panel; survey respondents that met inclusion/exclusion criteria, including willingness to take photos of a visible concern on their body, were enrolled. Respondents were asked to use the Google mobile app to conduct both regular text-based search (Google Search) and image-based search (Google Lens) for their concern, with the order of text vs. image search randomized. Satisfaction for each search experience along six different dimensions were recorded and compared, and respondents’ preferences for the different search types along these same six dimensions were recorded.
Results: 372 respondents were enrolled in the study, with 44% self-identifying as women, 86% as White and 41% over age 45. The rate of respondents who were at least moderately familiar with searching for skin conditions using text-based search versus image-based search were 81.5% and 63.5%, respectively. After using both search modalities, respondents were highly satisfied with both image-based and text-based search, with >90% at least somewhat satisfied in each dimension and no significant differences seen between text-based and image-based search when examining the responses on an absolute scale per search modality. When asked to directly rate their preferences in a comparative way, survey respondents preferred image-based search over text-based search in 5 out of 6 dimensions, with an absolute 9.9% more preferring image-based search over text-based search overall (p=0.004). 82.5% (95% CI 78.2 - 86.3) reported a preference to leverage image-based search (alone or in combination with text-based search) in future searches. Of those who would prefer to use a combination of both, 64% indicated they would like to start with image-based search, indicating that image-based search may be the preferred entry point for skin-related searches.
Conclusion: Despite being less familiar with image-based search upon study inception, survey respondents generally preferred image-based search to text-based search and overwhelmingly wanted to include this in future searches. These results suggest the potential for image-based search to play a key role in people searching for information regarding skin concerns.
View details
Stable quantum-correlated many-body states through engineered dissipation
Xiao Mi
Alexios Michailidis
Sara Shabani
Jerome Lloyd
Rajeev Acharya
Igor Aleiner
Trond Andersen
Markus Ansmann
Frank Arute
Kunal Arya
Juan Atalaya
Gina Bortoli
Alexandre Bourassa
Leon Brill
Michael Broughton
Bob Buckley
Tim Burger
Nicholas Bushnell
Jimmy Chen
Benjamin Chiaro
Desmond Chik
Charina Chou
Josh Cogan
Roberto Collins
Paul Conner
William Courtney
Alex Crook
Ben Curtin
Alejo Grajales Dau
Dripto Debroy
Agustin Di Paolo
ILYA Drozdov
Andrew Dunsworth
Lara Faoro
Edward Farhi
Reza Fatemi
Vinicius Ferreira
Ebrahim Forati
Brooks Foxen
Élie Genois
William Giang
Dar Gilboa
Raja Gosula
Steve Habegger
Michael Hamilton
Monica Hansen
Sean Harrington
Paula Heu
Markus Hoffmann
Trent Huang
Ashley Huff
Bill Huggins
Sergei Isakov
Justin Iveland
Cody Jones
Pavol Juhas
Kostyantyn Kechedzhi
Marika Kieferova
Alexei Kitaev
Andrey Klots
Alexander Korotkov
Fedor Kostritsa
John Mark Kreikebaum
Dave Landhuis
Pavel Laptev
Kim Ming Lau
Lily Laws
Joonho Lee
Kenny Lee
Yuri Lensky
Alexander Lill
Wayne Liu
Orion Martin
Amanda Mieszala
Shirin Montazeri
Alexis Morvan
Ramis Movassagh
Wojtek Mruczkiewicz
Charles Neill
Ani Nersisyan
Michael Newman
JiunHow Ng
Murray Ich Nguyen
Tom O'Brien
Alex Opremcak
Andre Petukhov
Rebecca Potter
Leonid Pryadko
Charles Rocque
Negar Saei
Kannan Sankaragomathi
Henry Schurkus
Christopher Schuster
Mike Shearn
Aaron Shorter
Noah Shutty
Vladimir Shvarts
Jindra Skruzny
Clarke Smith
Rolando Somma
George Sterling
Doug Strain
Marco Szalay
Alfredo Torres
Guifre Vidal
Cheng Xing
Jamie Yao
Ping Yeh
Juhwan Yoo
Grayson Young
Yaxing Zhang
Ningfeng Zhu
Jeremy Hilton
Anthony Megrant
Yu Chen
Vadim Smelyanskiy
Dmitry Abanin
Science, 383 (2024), pp. 1332-1337
Preview abstract
Engineered dissipative reservoirs have the potential to steer many-body quantum systems toward correlated steady states useful for quantum simulation of high-temperature superconductivity or quantum magnetism. Using up to 49 superconducting qubits, we prepared low-energy states of the transverse-field Ising model through coupling to dissipative auxiliary qubits. In one dimension, we observed long-range quantum correlations and a ground-state fidelity of 0.86 for 18 qubits at the critical point. In two dimensions, we found mutual information that extends beyond nearest neighbors. Lastly, by coupling the system to auxiliaries emulating reservoirs with different chemical potentials, we explored transport in the quantum Heisenberg model. Our results establish engineered dissipation as a scalable alternative to unitary evolution for preparing entangled many-body states on noisy quantum processors.
View details
Preview abstract
In this paper, we present SCOREQ, a novel approach for speech quality prediction. SCOREQ is a triplet loss function for contrastive regression that addresses the domain generalisation shortcoming exhibited by state of the art no-reference speech quality metrics. In the paper we: (i) illustrate the problem of L2 loss training failing at capturing the continuous nature of the mean opinion score (MOS) labels; (ii) demonstrate the lack of generalisation through a benchmarking evaluation across several speech domains; (iii) outline our approach and explore the impact of the architectural design decisions through incremental evaluation; (iv) evaluate the final model against state of the art models for a wide variety of data and domains. The results show that the lack of generalisation observed in state of the art speech quality metrics is addressed by SCOREQ. We conclude that using a triplet loss function
View details
Preview abstract
Motivated by the necessity of guiding and monitoring students' progress in real-time when assembling circuits during in-class activities we propose BlinkBoard, an augmented breadboard to enhance offline as well as online physical computing classes. BlinkBoard uses LEDs placed on each row of the breadboard to guide, via four blinking patterns, how to place and connect components and wires. It also uses a set of Input/Output pins to sense voltage levels at user-specified rows or to generate voltage output. Our hardware uses an open JSON protocol of commands and responses that can be integrated with a graphical application hosted on a computer that ensures bidirectional communication between each of the students' BreadBoard and the instructor's dashboard and slides. The hardware is affordable and simple, partially due to a customized circuit configured via a hardware description language that handles the LEDs' patterns with minimal load on the Arduino micro-controller. Finally, we briefly show how this hardware made its way to a workshop with high-school students and an undergraduate class in a design department.
View details
Preview abstract
We propose a new Markov Decision Process (MDP) model for ad auctions to capture the
user response to the quality of ads, with the objective of maximizing the long-term discounted
revenue. By incorporating user response, our model takes into consideration all three parties
involved in the auction (advertiser, auctioneer, and user). The state of the user is modeled as a
user-specific click-through rate (CTR) with the CTR changing in the next round according to the
set of ads shown to the user in the current round. We characterize the optimal mechanism for this MDP as a Myerson’s auction with a notion of modified virtual value, which relies on the value distribution of the advertiser, the current user state, and the future impact of showing the ad to the user. Leveraging this characterization, we design a sample-efficient and computationally-efficient algorithm which outputs an approximately optimal policy that requires only sample access to the true MDP and the value distributions of the bidders. Finally, we propose a simple mechanism built upon second price auctions with personalized reserve prices and show it can achieve a constant-factor approximation to the optimal long term discounted revenue.
View details
Preview abstract
Principal-agent problems arise when one party acts on behalf of another, leading to conflicts of interest. The economic literature has extensively studied principal-agent problems, and recent work has extended this to more complex scenarios such as Markov Decision Processes (MDPs). In this paper, we further explore this line of research by investigating how reward shaping under budget constraints can improve the principal's utility. We study a two-player Stackelberg game where the principal and the agent have different reward functions, and the agent chooses an MDP policy for both players. The principal offers an additional reward to the agent, and the agent picks their policy selfishly to maximize their reward, which is the sum of the original and the offered reward. Our results establish the NP-hardness of the problem and offer polynomial approximation algorithms for two classes of instances: Stochastic trees and deterministic decision processes with a finite horizon.
View details
Rapid initial state preparation for the quantum simulation of strongly correlated molecules and materials
Dominic Berry
Yu Tong
Alec White
Tae In Kim
Lin Lin
Seunghoon Lee
Garnet Chan
arXiv:2409.11748 (2024)
Preview abstract
Studies on quantum algorithms for ground state energy estimation often assume perfect ground state preparation; however, in reality the initial state will have imperfect overlap with the true ground state. Here we address that problem in two ways: by faster preparation of matrix product state (MPS) approximations, and more efficient filtering of the prepared state to find the ground state energy. We show how to achieve unitary synthesis with a Toffoli complexity about $7 \times$ lower than that in prior work, and use that to derive a more efficient MPS preparation method. For filtering we present two different approaches: sampling and binary search. For both we use the theory of window functions to avoid large phase errors and minimise the complexity. We find that the binary search approach provides better scaling with the overlap at the cost of a larger constant factor, such that it will be preferred for overlaps less than about 0.003. Finally, we estimate the total resources to perform ground state energy estimation of FeMoco and Iron cluster systems by estimating ground state overlap on an MPS initial state through extrapolation. With a modest bond dimension of 4000 we estimate a 0.96 overlap squared value producing total resources of $7.5 \times 10^{10}$ Toffoli gates; validating naive estimates where we assume perfect ground state overlap. These extrapolations allay practical concerns of exponential overlap decay in challenging-to-compute chemical systems.
View details
Preview abstract
We present an approach to modeling an image-space prior on scene motion. Our prior is learned from a collection of motion trajectories extracted from real video sequences depicting natural, oscillatory dynamics such as trees, flowers, candles, and clothes swaying in the wind. We model this dense, long-term motion prior in the Fourier domain:given a single image, our trained model uses a frequency-coordinated diffusion sampling process to predict a spectral volume, which can be converted into a motion texture that spans an entire video. Along with an image-based rendering module, these trajectories can be used for a number of downstream applications, such as turning still images into seamlessly looping videos, or allowing users to realistically interact with objects in real pictures by interpreting the spectral volumes as image-space modal bases, which approximate object dynamics.
View details
Exploring the Feasibility of Remote Cardiac Auscultation Using Earphones
Tao Chen
Yongjie Yang
Xiuzhen Guo
Jie Xiong
Shangguan Longfei
MobiCom 2024: The 30th Annual International Conference On Mobile Computing And Networking
Preview abstract
The elderly over 65 accounts for 80% of COVID deaths in the United States. In response to the pandemic, the federal, state governments, and commercial insurers are promoting video visits, through which the elderly can access specialists at home over the Internet, without the risk of COVID exposure. However, the current video visit practice barely relies on video observation and talking. The specialist could not assess the patient's health conditions by performing auscultations.
This paper tries to address this key missing component in video visits by proposing Asclepius, a hardware-software solution that turns the patient's earphones into a stethoscope, allowing the specialist to hear the patient's fine-grained heart sound (i.e., PCG signals) in video visits. To achieve this goal, we contribute a low-cost plug-in peripheral that repurposes the earphone's speaker into a microphone and uses it to capture the patient's minute PCG signals from her ear canal. As the PCG signals suffer from strong attenuation and multi-path effects when propagating from the heart to ear canals, we then propose efficient signal processing algorithms coupled with a data-driven approach to de-reverberate and further correct the amplitude and frequency distortion in raw PCG receptions. We implement Asclepius on a 2-layer PCB board and follow the IRB protocol to evaluate its performance with 30 volunteers. Our extensive experiments show that Asclepius can effectively recover Phonocardiogram (PCG) signals with different types of earphones. The feedback from cardiologists also confirms the efficacy and efficiency of our system. PCG signal samples and benchmark results can be found at an anonymous link https://asclepius-system.github.io/
View details
Bridging the Preference Gap between Retrievers and LLMs
Zixuan Ke
Qiaozhu Mei
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (2024) (to appear)
Preview abstract
Large Language Models (LLMs) have demonstrated superior results across a wide range of tasks, and Retrieval-augmented Generation (RAG) is an effective way to enhance the performance by locating relevant information and placing it into the context window of the LLM. However, the relationship between retrievers and LLM in a RAG is still under-investigated. Most existing work treats the retriever and the LLM as independent components and leaves a gap between retrieving human-"friendly" information and assembling a LLM-"friendly" context. In this work, we examine a novel bridge mechanism. We validate the ranking and selection assumptions of retrievers in the context of RAG and propose a framework that chains together supervised and reinforcement learning to train a bridge model that optimizes the connection between the retriever and the LLM. Empirical results demonstrate the effectiveness of our method in both question-answering and personalized generation tasks.
View details
RewriteLM: An Instruction-Tuned Large LanguageModel for Text Rewriting
Yun Zhu
Simon Tong
Lei Meng
Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 18970-18980 (2024)
Preview abstract
In recent years, Large Language Models (LLMs) have demonstrated impressive zero-shot capabilities in text generation tasks expressed through natural language instructions. However, text rewriting is a challenging task, and unintended modifications can negatively impact the system's performance. To address this challenge, we introduce a novel benchmark for text rewriting that covers a wide variety of rewriting types expressed through natural language instructions. Unlike previous benchmarks, which were primarily focused on limited rewrite styles and sentence-level rewriting, our benchmark is specifically designed to facilitate open-ended rewriting of long-form text. Additionally, we present a strong baseline model, RewriteLM, which is an instruction-tuned large language model for text rewriting. The model is trained using supervised fine-tuning, reward training, and reinforcement learning. To minimize human intervention in the data collection process, we develop new data generation strategies: (1) utilizing high-quality, long-form edits from Wikipedia as our primary natural training data source, (2) generating a synthetic dataset that includes diverse edit types and non-Wiki domains using chain-of-thoughts and the capabilities of LLMs, and (3) employing human-designed heuristic rankers to generate preference data. Our experiments demonstrate the effectiveness of our proposed benchmark and baseline model, as well as the benefits of our data collection strategies in minimizing human intervention.
View details
How we use GenAI in SRE
CommitConf, Madrid (2024)
Preview abstract
Google services are powered by the largest network of computers in the world. Site Reliabity Engineers (SRE) make sure that the whole stack is cool: datacenters are safe, well provisionedl; we have fallback mechanims, and data integrity; to making sure we design our stack properly, using the right storage, replication and software trade-offs.
Generative AI is a great tool to make us super-effective: having access to tools to generate our most toily configurations, to classify risks and events, to manage large swaths of machines with agents or to automate complex workflows cheaply.
This talk will cover the journey that SRE started years ago to become a truly AI-First discipline and the latest advancements in tooling, practices and workflows.
View details
CodeQueries: A Dataset of Semantic Queries over Code
Surya Prakash Sahu
Madhurima Mandal
Shikhar Bharadwaj
Aditya Kanade
Shirish Shevade
Innovations in Software Engineering (ISEC), ACM, Bangalore, India (2024)
Preview abstract
Developers often have questions about semantic aspects of code
they are working on, e.g., “Is there a class whose parent classes
declare a conflicting attribute?”. Answering them requires understanding code semantics such as attributes and inheritance relation
of classes. An answer to such a question should identify code spans
constituting the answer (e.g., the declaration of the subclass) as well
as supporting facts (e.g., the definitions of the conflicting attributes).
The existing work on question-answering over code has considered
yes/no questions or method-level context. We contribute a labeled
dataset, called CodeQueries, of semantic queries over Python code.
Compared to the existing datasets, in CodeQueries, the queries
are about code semantics, the context is file level and the answers
are code spans. We curate the dataset based on queries supported
by a widely-used static analysis tool, CodeQL, and include both
positive and negative examples, and queries requiring single-hop
and multi-hop reasoning.
To assess the value of our dataset, we evaluate baseline neural
approaches. We study a large language model (GPT3.5-Turbo) in
zero-shot and few-shot settings on a subset of CodeQueries. We
also evaluate a BERT style model (CuBERT) with fine-tuning. We
find that these models achieve limited success on CodeQueries.
CodeQueries is thus a challenging dataset to test the ability of
neural models, to understand code semantics, in the extractive
question-answering setting
View details
Preview abstract
We describe a quantum algorithm for the Planted Noisy kXOR problem (also known as sparse Learning Parity with Noise) that achieves a nearly quartic (4th power) speedup over the best known classical algorithm while also only using logarithmically many qubits. Our work generalizes and simplifies prior work of Hastings, by building on his quantum algorithm for the Tensor Principal Component Analysis (PCA) problem. We achieve our quantum speedup using a general framework based on the Kikuchi Method (recovering the quartic speedup for Tensor PCA), and we anticipate it will yield similar speedups for further planted inference problems. These speedups rely on the fact that planted inference problems naturally instantiate the Guided Sparse Hamiltonian problem. Since the Planted Noisy kXOR problem has been used as a component of certain cryptographic constructions, our work suggests that some of these are susceptible to super-quadratic quantum attacks.
View details