Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10129 publications
A Decade of Privacy-Relevant Android App Reviews: Large Scale Trends
Omer Akgul
Michelle Mazurek
Benoit Seguin
Preview abstract
We present an analysis of 12 million instances of privacy-relevant reviews publicly visible on the Google Play Store that span a 10 year period. By leveraging state of the art NLP techniques, we examine what users have been writing about privacy along multiple dimensions: time, countries, app types, diverse privacy topics, and even across a spectrum of emotions. We find consistent growth of privacy-relevant reviews, and explore topics that are trending (such as Data Deletion and Data Theft), as well as those on the decline (such as privacy-relevant reviews on sensitive permissions). We find that although privacy reviews come from more than 200 countries, 33 countries provide 90% of privacy reviews. We conduct a comparison across countries by examining the distribution of privacy topics a country’s users write about, and find that geographic proximity is not a reliable indicator that nearby countries have similar privacy perspectives. We uncover some countries with unique patterns and explore those herein. Surprisingly, we uncover that it is not uncommon for reviews that discuss privacy to be positive (32%); many users express pleasure about privacy features within apps or privacy-focused apps. We also uncover some unexpected behaviors, such as the use of reviews to deliver privacy disclaimers to developers. Finally, we demonstrate the value of analyzing app reviews with our approach as a complement to existing methods for understanding users' perspectives about privacy.
View details
With Great Power Comes Great Responsibility: Security and Privacy Issues of Modern Browser APIs
Harun Oz
Daniele Cono D’Elia
Abbas Acar
Riccardo Lazzeretti
Selcuk Uluagac
IEEE Security and Privacy (2024)
Preview abstract
This paper discusses security and privacy issues in modern Browser
APIs by categorizing them based on their functionality. With this study, we aim to
alert the community about these issues and motivate further research into
analyzing the security and privacy concerns within modern Browser APIs.
View details
Preview abstract
Motivated by recent advances in large language models for NLP, we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of datasets, matches the accuracy of state-of-the-art supervised forecasting models for each individual dataset. Our model is based on pretraining a patched-decoder style attention model on a large time series dataset, and can work well across different forecasting history lengths, prediction lengths and temporal granularities.
View details
Understanding Use Cases for AI-Powered Visual Interpretation Services
Ricardo Gonzalez
Jazmin Collins
Shiri Azenkot
CHI Conference on Human-Computer Interaction (2024)
Preview abstract
"Scene description" applications that describe visual content in a photo are useful daily tools for blind and low vision (BLV) people. Researchers have
studied their use, but they have only explored those that leverage remote sighted assistants; little is known about applications that use AI to generate
their descriptions. Thus, to investigate their use cases, we conducted a two-week diary study where 16 BLV participants used an AI-powered scene description
application we designed. Through their diary entries and follow-up interviews, users shared their information goals and assessments of the visual descriptions
they received. We analyzed the entries and found frequent use cases, such as identifying visual features of known objects, and surprising ones, such as avoiding contact with dangerous objects. We also found users scored the descriptions relatively low on average,
2.76 out of 5 (SD=1.49) for satisfaction and 2.43 out of 4 (SD=1.16) for trust, showing that descriptions still need signifcant improvements to deliver
satisfying and trustworthy experiences. We discuss future opportunities for AI as it becomes a more powerful accessibility tool for BLV users.
View details
Augmented Object Intelligence with XR-Objects
Mustafa Doga Dogan
Karan Ahuja
Andrea Colaco
Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology (UIST), ACM (2024), pp. 1-15
Preview abstract
Seamless integration of physical objects as interactive digital entities remains a challenge for spatial computing. This paper explores Augmented Object Intelligence (AOI) in the context of XR, an interaction paradigm that aims to blur the lines between digital and physical by equipping real-world objects with the ability to interact as if they were digital, where every object has the potential to serve as a portal to digital functionalities. Our approach utilizes real-time object segmentation and classification, combined with the power of Multimodal Large Language Models (MLLMs), to facilitate these interactions without the need for object pre-registration. We implement the AOI concept in the form of XR-Objects, an open-source prototype system that provides a platform for users to engage with their physical environment in contextually relevant ways using object-based context menus. This system enables analog objects to not only convey information but also to initiate digital actions, such as querying for details or executing tasks. Our contributions are threefold: (1) we define the AOI concept and detail its advantages over traditional AI assistants, (2) detail the XR-Objects system’s open-source design and implementation, and (3) show its versatility through various use cases and a user study.
View details
Preview abstract
In this talk, we will introduce the development and evolution of speaker diarization technologies at Google in the past decade, and how they landed as impactful products such as Cloud Speech-to-Text and the Pixel Recorder app. The talk will cover four critical milestones of the speaker diarization technologies at Google: (1) leveraging deep speaker embeddings; (2) leveraging supervised clustering; (3) leveraging sequence transducers; and (4) leveraging large language models. The talk will also discuss how speaker diarization will evolve in the new era of multimodal large language models.
View details
Preview abstract
Recent significant advances in text-to-image models unlock the possibility of training vision systems using synthetic images, potentially overcoming the difficulty of collecting curated data at scale. It is unclear, however, how these models behave at scale, as more synthetic data is added to the training set. In this paper we study the scaling laws of synthetic images generated by state of the art text-to-image models, for the training of supervised models: image classifiers with label supervision, and CLIP with language supervision. We identify several factors, including text prompts, classifier-free guidance scale, and types of text-to-image models, that significantly affect scaling behavior. After tuning these factors, we observe that synthetic images demonstrate a scaling trend similar to, but slightly less effective than, real images in CLIP training, while they significantly underperform in scaling when training supervised image classifiers. Our analysis indicates that the main reason for this underperformance is the inability of off-the-shelf text-to-image models to generate certain concepts, a limitation that significantly impairs the training of image classifiers. Our findings also suggest that scaling synthetic data can be particularly effective in scenarios such as: (1) when there is a limited supply of real images for a supervised problem (e.g., fewer than 0.5 million images in ImageNet), (2) when the evaluation dataset diverges significantly from the training data, indicating the out-of-distribution scenario, or (3) when synthetic data is used in conjunction with real images, as demonstrated in the training of CLIP models.
View details
AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation
Yuanwen Yue
Sabarinath Mahadevan
Jonas Schult
Francis Engelmann
Bastian Leibe
Konrad Schindler
Theodora Kontogianni
ICLR (2024)
Preview abstract
During interactive segmentation, a model and a user work together to delineate objects of interest in a 3D point cloud. In an iterative process, the model assigns each data point to an object (or the background), while the user corrects errors in the resulting segmentation and feeds them back into the model. The current best practice formulates the problem as binary classification and segments objects one at a time. The model expects the user to provide positive clicks to indicate regions wrongly assigned to the background and negative clicks on regions wrongly assigned to the object. Sequentially visiting objects is wasteful since it disregards synergies between objects: a positive click for a given object can, by definition, serve as a negative click for nearby objects. Moreover, a direct competition between adjacent objects can speed up the identification of their common boundary. We introduce AGILE3D, an efficient, attention-based model that (1) supports simultaneous segmentation of multiple 3D objects, (2) yields more accurate segmentation masks with fewer user clicks, and (3) offers faster inference. Our core idea is to encode user clicks as spatial-temporal queries and enable explicit interactions between click queries as well as between them and the 3D scene through a click attention module. Every time new clicks are added, we only need to run a lightweight decoder that produces updated segmentation masks. In experiments with four different 3D point cloud datasets, AGILE3D sets a new state-of-the-art. Moreover, we also verify its practicality in real-world setups with real user studies. Project page: https://ywyue.github.io/AGILE3D.
View details
Preview abstract
Proving mathematical theorems at the olympiad level represents a notable milestone in human-level automated reasoning, owing to their reputed difficulty among the world’s best talents in pre-university mathematics. Current machine-learning approaches, however, are not applicable to most mathematical domains owing to the high cost of translating human proofs into machine-verifiable format. The problem is even worse for geometry because of its unique translation challenges, resulting in severe scarcity of training data. We propose AlphaGeometry, a theorem prover for Euclidean plane geometry that sidesteps the need for human demonstrations by synthesizing millions of theorems and proofs across different levels of complexity. AlphaGeometry is a neuro-symbolic system that uses a neural language model, trained from scratch on our large-scale synthetic data, to guide a symbolic deduction engine through infinite branching points in challenging problems. On a test set of 30 latest olympiad-level problems, AlphaGeometry solves 25, outperforming the previous best method that only solves ten problems and approaching the performance of an average International Mathematical Olympiad (IMO) gold medallist. Notably, AlphaGeometry produces human-readable proofs, solves all geometry problems in the IMO 2000 and 2015 under human expert evaluation and discovers a generalized version of a translated IMO theorem in 2004.
View details
Creativity, Generative AI, and Software Development: A Research Agenda
Victoria Jackson
Bogdan Vasilescu
Daniel Russo
Paul Ralph
Maliheh Izadi
Rafael Prikladnicki
Anielle Lisboa
Andre van der Hoek
Preview abstract
Creativity has always been considered a major differentiator to separate the good from the great, and we believe the importance of creativity to software development will only increase as GenAI becomes embedded in developer tool-chains and working practices. This paper uses the McLuhan tetrad alongside scenarios of how GenAI may disrupt software development more broadly, to identify potential impacts GenAI may have on creativity within software development. The impacts are discussed along with a future research agenda comprising of six connected themes that consider how individual capabilities, team capabilities, the product, unintended consequences, society, and human aspects can be affected.
View details
Exploring the Feasibility of Remote Cardiac Auscultation Using Earphones
Tao Chen
Yongjie Yang
Xiuzhen Guo
Jie Xiong
Shangguan Longfei
MobiCom 2024: The 30th Annual International Conference On Mobile Computing And Networking
Preview abstract
The elderly over 65 accounts for 80% of COVID deaths in the United States. In response to the pandemic, the federal, state governments, and commercial insurers are promoting video visits, through which the elderly can access specialists at home over the Internet, without the risk of COVID exposure. However, the current video visit practice barely relies on video observation and talking. The specialist could not assess the patient's health conditions by performing auscultations.
This paper tries to address this key missing component in video visits by proposing Asclepius, a hardware-software solution that turns the patient's earphones into a stethoscope, allowing the specialist to hear the patient's fine-grained heart sound (i.e., PCG signals) in video visits. To achieve this goal, we contribute a low-cost plug-in peripheral that repurposes the earphone's speaker into a microphone and uses it to capture the patient's minute PCG signals from her ear canal. As the PCG signals suffer from strong attenuation and multi-path effects when propagating from the heart to ear canals, we then propose efficient signal processing algorithms coupled with a data-driven approach to de-reverberate and further correct the amplitude and frequency distortion in raw PCG receptions. We implement Asclepius on a 2-layer PCB board and follow the IRB protocol to evaluate its performance with 30 volunteers. Our extensive experiments show that Asclepius can effectively recover Phonocardiogram (PCG) signals with different types of earphones. The feedback from cardiologists also confirms the efficacy and efficiency of our system. PCG signal samples and benchmark results can be found at an anonymous link https://asclepius-system.github.io/
View details
Preview abstract
Generative AI (GAI) is proliferating, and among its many applications are to support creative work (e.g., generating text, images, music) and to enhance accessibility (e.g., captions of images and audio). As GAI evolves, creatives must consider how (or how not) to incorporate these tools into their practices. In this paper, we present interviews at the intersection of these applications. We learned from 10 creatives with disabilities who intentionally use and do not use GAI in and around their creative work. Their mediums ranged from audio engineering to leatherwork, and they collectively experienced a variety of disabilities, from sensory to motor to invisible disabilities. We share cross-cutting themes of their access hacks, how creative practice and access work become entangled, and their perspectives on how GAI should and should not fit into their workflows. In turn, we offer qualities of accessible creativity with responsible AI that can inform future research.
View details
Dynamics of magnetization at infinite temperature in a Heisenberg spin chain
Trond Andersen
Rhine Samajdar
Andre Petukhov
Jesse Hoke
Dmitry Abanin
ILYA Drozdov
Xiao Mi
Alexis Morvan
Charles Neill
Rajeev Acharya
Richard Ross Allen
Kyle Anderson
Markus Ansmann
Frank Arute
Kunal Arya
Juan Atalaya
Gina Bortoli
Alexandre Bourassa
Leon Brill
Michael Broughton
Bob Buckley
Tim Burger
Nicholas Bushnell
Juan Campero
Hung-Shen Chang
Jimmy Chen
Benjamin Chiaro
Desmond Chik
Josh Cogan
Roberto Collins
Paul Conner
William Courtney
Alex Crook
Ben Curtin
Agustin Di Paolo
Andrew Dunsworth
Clint Earle
Lara Faoro
Edward Farhi
Reza Fatemi
Vinicius Ferreira
Ebrahim Forati
Brooks Foxen
Gonzalo Garcia
Élie Genois
William Giang
Dar Gilboa
Raja Gosula
Alejo Grajales Dau
Steve Habegger
Michael Hamilton
Monica Hansen
Sean Harrington
Paula Heu
Gordon Hill
Markus Hoffmann
Trent Huang
Ashley Huff
Bill Huggins
Sergei Isakov
Justin Iveland
Cody Jones
Pavol Juhas
Marika Kieferova
Alexei Kitaev
Andrey Klots
Alexander Korotkov
Fedor Kostritsa
John Mark Kreikebaum
Dave Landhuis
Pavel Laptev
Kim Ming Lau
Lily Laws
Joonho Lee
Kenny Lee
Yuri Lensky
Alexander Lill
Wayne Liu
Salvatore Mandra
Orion Martin
Steven Martin
Seneca Meeks
Amanda Mieszala
Shirin Montazeri
Ramis Movassagh
Wojtek Mruczkiewicz
Ani Nersisyan
Michael Newman
JiunHow Ng
Murray Ich Nguyen
Tom O'Brien
Seun Omonije
Alex Opremcak
Rebecca Potter
Leonid Pryadko
David Rhodes
Charles Rocque
Negar Saei
Kannan Sankaragomathi
Henry Schurkus
Christopher Schuster
Mike Shearn
Aaron Shorter
Noah Shutty
Vladimir Shvarts
Vlad Sivak
Jindra Skruzny
Clarke Smith
Rolando Somma
George Sterling
Doug Strain
Marco Szalay
Doug Thor
Alfredo Torres
Guifre Vidal
Cheng Xing
Jamie Yao
Ping Yeh
Juhwan Yoo
Grayson Young
Yaxing Zhang
Ningfeng Zhu
Jeremy Hilton
Anthony Megrant
Yu Chen
Vadim Smelyanskiy
Vedika Khemani
Sarang Gopalakrishnan
Tomaž Prosen
Science, 384 (2024), pp. 48-53
Preview abstract
Understanding universal aspects of quantum dynamics is an unresolved problem in statistical mechanics. In particular, the spin dynamics of the one-dimensional Heisenberg model were conjectured as to belong to the Kardar-Parisi-Zhang (KPZ) universality class based on the scaling of the infinite-temperature spin-spin correlation function. In a chain of 46 superconducting qubits, we studied the probability distribution of the magnetization transferred across the chain’s center, P(M). The first two moments of P(M) show superdiffusive behavior, a hallmark of KPZ universality. However, the third and fourth moments ruled out the KPZ conjecture and allow for evaluating other theories. Our results highlight the importance of studying higher moments in determining dynamic universality classes and provide insights into universal behavior in quantum systems.
View details
Data Exchange Markets via Utility Balancing
Aditya Bhaskara
Sungjin Im
Kamesh Munagala
Govind S. Sankar
WebConf (2024)
Preview abstract
This paper explores the design of a balanced data-sharing marketplace for entities with heterogeneous datasets and machine learning models that they seek to refine using data from other agents. The goal of the marketplace is to encourage participation for data sharing in the presence of such heterogeneity. Our market design approach for data sharing focuses on interim utility balance, where participants contribute and receive equitable utility from refinement of their models. We present such a market model for which we study computational complexity, solution existence, and approximation algorithms for welfare maximization and core stability. We finally support our theoretical insights with simulations on a mean estimation task inspired by road traffic delay estimation.
View details
Understanding metric-related pitfalls in image analysis validation
Annika Reinke
Lena Maier-Hein
Paul Jager
Shravya Shetty
Understanding Metrics Workgroup
Nature Methods (2024)
Preview abstract
Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.
View details