Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10132 publications
Hovering Over the Key to Text Input in XR
Diar Abdlkarim
Arpit Bhatia
Stuart Macgregor
Jason Fotso-Puepi
Hasti Seifi
Massimiliano Di Luca
Karan Ahuja
Preview abstract
Virtual, Mixed, and Augmented Reality (XR) technologies hold immense potential for transforming productivity beyond PC. Therefore there is a critical need for improved text input solutions for XR. However, achieving efficient text input in these environments remains a significant challenge. This paper examines the current landscape of XR text input techniques, focusing on the importance of keyboards (both physical and virtual) as essential tools. We discuss the unique challenges and opportunities presented by XR, synthesizing key trends from existing solutions.
View details
Sharing is leaking: blocking transient-execution attacks with core-gapped confidential VMs
Charly Castes
29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4 (ASPLOS '24) (2024)
Preview abstract
Confidential VMs on platforms such as Intel TDX, AMD SEV and Arm CCA promise greater security for cloud users against even a hypervisor-level attacker, but this promise has been shattered by repeated transient-execution vulnerabilities and CPU bugs. At the root of this problem lies the need to multiplex CPU cores with all their complex microarchitectural state among distrusting entities, with an untrusted hypervisor in control of the multiplexing.
We propose core-gapped confidential VMs, a set of software-only modifications that ensure that no distrusting code shares a core, thus removing all same-core side-channels and transient-execution vulnerabilities from the guest’s TCB. We present an Arm-based prototype along with a performance evaluation showing that, not only does core-gapping offer performance competitive with non-confidential VMs, the greater locality achieved by avoiding shared cores can even improve performance for CPU-intensive workloads.
View details
Visual Program Tuning: Training Large Multimodal Models to Reason like Programs
Yushi Hu
Krishna Viswanathan
Kenji Hata
Enming Luo
Ranjay Krishna
Ariel Fuxman
Conference on Computer Vision and Pattern Recognition (2024)
Preview abstract
Solving complex visual tasks (e.g., “Who invented the musical instrument on the right?”) involves back-and-forth between visual processing and reasoning. Visual programming is a recent multimodal framework that has shown promise in conducting visual reasoning in an interpretable and compositional manner. However, this framework is error-prone—it can lead to a wrong answer whenever the program itself is wrong, or when any of the steps of the program are solved incorrectly, thus leading to worse overall performance than end-to-end systems trained with labeled data. Moreover, it is inefficient to involve multiple steps (i.e., generating and then running programs) during inference. Ideally, a single large multimodal model (LMM) should directly conduct similar reasoning and yield the correct answer.
In this work, we propose Visual Program Tuning (VPT), which leverages visual programs for teaching LLMs to reason via instruction tuning. VPT rewrites the execution traces of visual programs as chain-of-thought reasoning steps, and tunes an LMM to output not only the label but its reasoning as well. Extensive experiments on complex vision tasks show that models trained with VPT achieve state-of-the-art accuracy while being able to produce interpretable and faithful reasoning steps. PaLI-X + VPT outperforms all existing LMMs on a wide range of visual tasks, improving performance on counting, spatial relations, and compositional reasoning tasks. VPT is also helpful for quick adaptation on new tasks. Our experiments on content moderation show that fine-tuning LMMs with program-augmented examples is more sample efficient than traditional supervised training.
View details
Towards Realistic Synthetic User-Generated Content: A Scaffolding Approach to Generating Online Discussions
Barbara Ikica
Hamidreza Alvari
Mehdi Hafezi Manshadi
(2024)
Preview abstract
The emergence of synthetic data represents a pivotal shift in modern machine learning, offering a solution to satisfy the need for large volumes of data in domains where real data is scarce, highly private, or difficult to obtain. We investigate the feasibility of creating realistic, large-scale synthetic datasets of user-generated content, noting that such content is increasingly prevalent and a source of frequently sought information. Large language models (LLMs) offer a starting point for generating synthetic social media discussion threads, due to their ability to produce diverse responses that typify online interactions. However, as we demonstrate, straightforward application of LLMs yields limited success in capturing the complex structure of online discussions, and standard prompting mechanisms lack sufficient control. We therefore propose a multi-step generation process, predicated on the idea of creating compact representations of discussion threads, referred to as scaffolds. Our framework is generic yet adaptable to the unique characteristics of specific social media platforms. We demonstrate its feasibility using data from two distinct online discussion platforms. To address the fundamental challenge of ensuring the representativeness and realism of synthetic data, we propose a portfolio of evaluation measures to compare various instantiations of our framework.
View details
Searching for Dermatology Information Online using Images vs Text: a Randomized Study
Jay Hartford
Amit Talreja
Natalie Salaets
Kimberley Raiford
Jay Nayar
Dounia Berrada
Harsh Kharbanda
Lou Wang
Peggy Bui
medRxiv (2024)
Preview abstract
Background: Skin conditions are extremely common worldwide, and are an important cause of both anxiety and morbidity. Since the advent of the internet, individuals have used text-based search (eg, “red rash on arm”) to learn more about concerns on their skin, but this process is often hindered by the inability to accurately describe the lesion’s morphology. In the study, we surveyed respondents’ experiences with an image-based search, compared to the traditional text-based search experience.
Methods: An internet-based survey was conducted to evaluate the experience of text-based vs image-based search for skin conditions. We recruited respondents from an existing cohort of volunteers in a commercial survey panel; survey respondents that met inclusion/exclusion criteria, including willingness to take photos of a visible concern on their body, were enrolled. Respondents were asked to use the Google mobile app to conduct both regular text-based search (Google Search) and image-based search (Google Lens) for their concern, with the order of text vs. image search randomized. Satisfaction for each search experience along six different dimensions were recorded and compared, and respondents’ preferences for the different search types along these same six dimensions were recorded.
Results: 372 respondents were enrolled in the study, with 44% self-identifying as women, 86% as White and 41% over age 45. The rate of respondents who were at least moderately familiar with searching for skin conditions using text-based search versus image-based search were 81.5% and 63.5%, respectively. After using both search modalities, respondents were highly satisfied with both image-based and text-based search, with >90% at least somewhat satisfied in each dimension and no significant differences seen between text-based and image-based search when examining the responses on an absolute scale per search modality. When asked to directly rate their preferences in a comparative way, survey respondents preferred image-based search over text-based search in 5 out of 6 dimensions, with an absolute 9.9% more preferring image-based search over text-based search overall (p=0.004). 82.5% (95% CI 78.2 - 86.3) reported a preference to leverage image-based search (alone or in combination with text-based search) in future searches. Of those who would prefer to use a combination of both, 64% indicated they would like to start with image-based search, indicating that image-based search may be the preferred entry point for skin-related searches.
Conclusion: Despite being less familiar with image-based search upon study inception, survey respondents generally preferred image-based search to text-based search and overwhelmingly wanted to include this in future searches. These results suggest the potential for image-based search to play a key role in people searching for information regarding skin concerns.
View details
Preview abstract
Recent significant advances in text-to-image models unlock the possibility of training vision systems using synthetic images, potentially overcoming the difficulty of collecting curated data at scale. It is unclear, however, how these models behave at scale, as more synthetic data is added to the training set. In this paper we study the scaling laws of synthetic images generated by state of the art text-to-image models, for the training of supervised models: image classifiers with label supervision, and CLIP with language supervision. We identify several factors, including text prompts, classifier-free guidance scale, and types of text-to-image models, that significantly affect scaling behavior. After tuning these factors, we observe that synthetic images demonstrate a scaling trend similar to, but slightly less effective than, real images in CLIP training, while they significantly underperform in scaling when training supervised image classifiers. Our analysis indicates that the main reason for this underperformance is the inability of off-the-shelf text-to-image models to generate certain concepts, a limitation that significantly impairs the training of image classifiers. Our findings also suggest that scaling synthetic data can be particularly effective in scenarios such as: (1) when there is a limited supply of real images for a supervised problem (e.g., fewer than 0.5 million images in ImageNet), (2) when the evaluation dataset diverges significantly from the training data, indicating the out-of-distribution scenario, or (3) when synthetic data is used in conjunction with real images, as demonstrated in the training of CLIP models.
View details
Analyzing Prospects for Quantum Advantage in Topological Data Analysis
Dominic W. Berry
Yuan Su
Casper Gyurik
Robbie King
Joao Basso
Abhishek Rajput
Nathan Wiebe
Vedran Djunko
PRX Quantum, 5 (2024), pp. 010319
Preview abstract
Lloyd et al. were first to demonstrate the promise of quantum algorithms for computing Betti numbers in persistent homology (a way of characterizing topological features of data sets). Here, we propose, analyze, and optimize an improved quantum algorithm for topological data analysis (TDA) with reduced scaling, including a method for preparing Dicke states based on inequality testing, a more efficient amplitude estimation algorithm using Kaiser windows, and an optimal implementation of eigenvalue projectors based on Chebyshev polynomials. We compile our approach to a fault-tolerant gate set and estimate constant factors in the Toffoli complexity. Our analysis reveals that super-quadratic quantum speedups are only possible for this problem when targeting a multiplicative error approximation and the Betti number grows asymptotically. Further, we propose a dequantization of the quantum TDA algorithm that shows that having exponentially large dimension and Betti number are necessary, but insufficient conditions, for super-polynomial advantage. We then introduce and analyze specific problem examples for which super-polynomial advantages may be achieved, and argue that quantum circuits with tens of billions of Toffoli gates can solve some seemingly classically intractable instances.
View details
SAC124 - SSAC Advice on Name Collision Analysis
Internet Corporation for Assigned Names and Numbers (ICANN), ICANN Security and Stability Advisory Committee (SSAC) Reports and Advisories (2024), pp. 15
Preview abstract
In this document the Security and Stability Advisory Committee (SSAC) provides its analysis of
the findings and recommendations presented within the Name Collision Analysis Project
(NCAP) Study Two and the proposed Name Collision Risk Assessment Framework. The SSAC
also provides additional commentary on several aspects of the NCAP Study Two Report and
makes recommendations to the ICANN Board.
View details
Generative models improve fairness of medical classifiers under distribution shifts
Ira Ktena
Olivia Wiles
Isabela Albuquerque
Sylvestre-Alvise Rebuffi
Ryutaro Tanno
Danielle Belgrave
Taylan Cemgil
Nature Medicine (2024)
Preview abstract
Domain generalization is a ubiquitous challenge for machine learning in healthcare. Model performance in real-world conditions might be lower than expected because of discrepancies between the data encountered during deployment and development. Underrepresentation of some groups or conditions during model development is a common cause of this phenomenon. This challenge is often not readily addressed by targeted data acquisition and ‘labeling’ by expert clinicians, which can be prohibitively expensive or practically impossible because of the rarity of conditions or the available clinical expertise. We hypothesize that advances in generative artificial intelligence can help mitigate this unmet need in a steerable fashion, enriching our training dataset with synthetic examples that address shortfalls of underrepresented conditions or subgroups. We show that diffusion models can automatically learn realistic augmentations from data in a label-efficient manner. We demonstrate that learned augmentations make models more robust and statistically fair in-distribution and out of distribution. To evaluate the generality of our approach, we studied three distinct medical imaging contexts of varying difficulty: (1) histopathology, (2) chest X-ray and (3) dermatology images. Complementing real samples with synthetic ones improved the robustness of models in all three medical tasks and increased fairness by improving the accuracy of clinical diagnosis within underrepresented groups, especially out of distribution.
View details
Preview abstract
Graphs are a powerful tool for representing and analyzing complex relationships in real-world applications such as social networks, recommender systems, and computational finance. Reasoning on graphs is essential for drawing inferences about the relationships between entities in a complex system, and to identify hidden patterns and trends. Despite the remarkable progress in automated reasoning with natural text, reasoning on graphs with large language models (LLMs) remains an understudied problem. In this work, we perform the first comprehensive study of encoding graph-structured data as text for consumption by LLMs. We show that LLM performance on graph reasoning tasks varies on three fundamental levels: (1) the graph encoding method, (2) the nature of the graph task itself, and (3) interestingly, the very structure of the graph considered. These novel results provide valuable insight on strategies for encoding graphs as text. Using these insights we illustrate how the correct choice of encoders can boost performance on graph reasoning tasks inside LLMs by 4.8% to 61.8%, depending on the task.
View details
Improved Communication-Privacy Trade-offs in L2 Mean Estimation under Streaming Differential Privacy
Wei-Ning Chen
Albert No
Sewoong Oh
International Conference on Machine Learning (ICML) (2024)
Preview abstract
We study $L_2$ mean estimation under central differential privacy and communication constraints, and address two key challenges: firstly, existing mean estimation schemes that simultaneously handle both constraints are usually optimized for $L_\infty$ geometry and rely on random rotation or Kashin's representation to adapt to $L_2$ geometry, resulting in suboptimal leading constants in mean square errors (MSEs); secondly, schemes achieving order-optimal communication-privacy trade-offs do not extend seamlessly to streaming differential privacy (DP) settings (e.g., tree aggregation or matrix factorization), rendering them incompatible with DP-FTRL type optimizers.
In this work, we tackle these issues by introducing a novel privacy accounting method for the sparsified Gaussian mechanism that incorporates the randomness inherent in sparsification into the DP noise. Unlike previous approaches, our accounting algorithm directly operates in $L_2$ geometry, yielding MSEs that fast converge to those of the uncompressed Gaussian mechanism. Additionally, we extend the sparsification scheme to the matrix factorization framework under streaming DP and provide a precise accountant tailored for DP-FTRL type optimizers. Empirically, our method demonstrates at least a 100x improvement of compression for DP-SGD across various FL tasks.
View details
SAC125 - SSAC Report on Registrar Nameserver Management
Gautam Akiwate
Tim April
kc claffy
Internet Corporation for Assigned Names and Numbers (ICANN), ICANN Security and Stability Advisory Committee (SSAC) Reports and Advisories (2024), pp. 56
Preview abstract
During domain registration, a minimum of two nameservers are typically required, and this
remains a requirement for any future updates to the domain. Often, domains are delegated to
nameservers that are subordinate to some other domains, creating inter-domain dependencies.
This network of dependencies creates a scenario where the functionality of a domain depends
on the operational status of another domain. This setup lacks contractual or procedural
safeguards against disruption or misuse, especially when the nameserver parent domain expires.
Most registries forbid deleting an expired domain if other domains depend on it for name
resolution. These constraints aim to prevent disruptions in DNS resolution for the dependent
domains. However, this also means that the expired domain remains in a liminal state, neither
fully operational nor completely removed. When registrars cannot delete expired domains with
dependents, they are forced to bear the burden of sponsoring the domain without remuneration
from the registrant. A peer-reviewed study, "Risky BIZness: Risks derived from Registrar Name
Management," observed that some registrars have found and utilized a loophole to these
constraints by renaming the host objects that are subordinate to the expiring domain.1 Once
renamed, the host objects are what Akiwate et al.—and subsequently the SSAC—refers to as
sacrificial nameservers.
This report focuses on a specific type of sacrificial nameserver where the parent domains of the renamed host objects are considered to be unsafe because they are registrable. Registrable
parent domains of sacrificial nameservers introduce a new attack surface for domain resolution
hijacking, as malicious actors can exploit unsafe sacrificial nameservers to gain unauthorized
control over the dependent domains, leading to manipulation or disruption. Unlike traditional
domain hijacking techniques that exploit compromised account credentials or manipulate the
resolution protocol, this report focuses on this unforeseen risk arising from a longstanding
practice employed by some registrars.
In this report, the SSAC explores potential solutions to remediate exposed domains and prevent
the creation of new unsafe sacrificial nameservers. The SSAC examines each proposed solution for its feasibility, effectiveness, and potential to reduce the attack surface without introducing undue complexity or new vulnerabilities into the DNS ecosystem.
View details
From Provenance to Aberrations: Image Creator and Screen Reader User Perspectives on Alt Text for AI-Generated Images
Maitraye Das
Alexander J. Fiannaca
CHI Conference on Human Factors in Computing Systems (2024)
Preview abstract
AI-generated images are proliferating as a new visual medium. However, state-of-the-art image generation models do not output alternative (alt) text with
their images, rendering them largely inaccessible to screen reader users (SRUs). Moreover, less is known about what information would be most desirable
to SRUs in this new medium. To address this, we invited AI image creators and SRUs to evaluate alt text prepared from various sources and write their own
alt text for AI images. Our mixed-methods analysis makes three contributions. First, we highlight creators’ perspectives on alt text, as creators are well-positioned
to write descriptions of their images. Second, we illustrate SRUs’ alt text needs particular to the emerging medium of AI images. Finally, we discuss the
promises and pitfalls of utilizing text prompts written as input for AI models in alt text generation, and areas where broader digital accessibility guidelines
could expand to account for AI images.
View details
Sleep patterns and risk of chronic disease as measured by long-term monitoring with commercial wearable devices in the All of Us Research Program
Neil S. Zheng
Jeffrey Annis
Hiral Master
Lide Han
Karla Gleichauf
Melody Nasser
Peyton Coleman
Stacy Desine
Douglas M. Ruderfer
John Hernandez
Logan D. Schneider
Evan L. Brittain
Nature Medicine (2024)
Preview abstract
Poor sleep health is associated with increased all-cause mortality and incidence of many chronic conditions. Previous studies have relied on cross-sectional and self-reported survey data or polysomnograms, which have limitations with respect to data granularity, sample size and longitudinal information. Here, using objectively measured, longitudinal sleep data from commercial wearable devices linked to electronic health record data from the All of Us Research Program, we show that sleep patterns, including sleep stages, duration and regularity, are associated with chronic disease incidence. Of the 6,785 participants included in this study, 71% were female, 84% self-identified as white and 71% had a college degree; the median age was 50.2 years (interquartile range = 35.7, 61.5) and the median sleep monitoring period was 4.5 years (2.5, 6.5). We found that rapid eye movement sleep and deep sleep were inversely associated with the odds of incident atrial fibrillation and that increased sleep irregularity was associated with increased odds of incident obesity, hyperlipidemia, hypertension, major depressive disorder and generalized anxiety disorder. Moreover, J-shaped associations were observed between average daily sleep duration and hypertension, major depressive disorder and generalized anxiety disorder. These findings show that sleep stages, duration and regularity are all important factors associated with chronic disease development and may inform evidence-based recommendations on healthy sleeping habits.
View details
Efficient Location Sampling Algorithms for Road Networks
Vivek Kumar
Ameya Velingker
Santhoshini Velusamy
WebConf (2024)
Preview abstract
Many geographic information systems applications rely on the data provided by user devices in the road network. Such applications include traffic monitoring, driving navigation, detecting road closures or the construction of new roads, etc. This signal is collected by sampling locations from the user trajectories and is a critical process for all such systems. Yet, it has not been sufficiently studied in the literature. The most natural way to sample a trajectory is perhaps using a frequency based algorithm, e.g., sample every $x$ seconds. However, as we argue in this paper, such a simple strategy can be very wasteful in terms of resources (e.g., server-side processing, user battery) and in terms of the amount of user data that it maintains. In this work we conduct a horizontal study of various location sampling algorithms (including frequency-based, road geography-based, reservoir-sampling based, etc.) and extract their trade-offs in terms of various metrics of interest, such as, the size of the stored data and the induced quality of training for prediction tasks (e.g., predicting speeds) using the road network of New York City.
View details