Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 457 publications
Preview abstract
We present PhoMoH, a neural network methodology to construct generative models of photo-realistic 3D geometry and appearance of human heads including hair, beards, an oral cavity, and clothing. In contrast to prior work, PhoMoH models the human head using neural fields, thus supporting complex topology. Instead of learning a head model from scratch, we propose to augment an existing expressive head model with new features. Concretely, we learn a highly detailed geometry network layered on top of a mid-resolution head model together with a detailed, local geometry-aware, and disentangled color field. Our proposed architecture allows us to learn photo-realistic human head models from relatively little data. The learned generative geometry and appearance networks can be sampled individually and enable the creation of diverse and realistic human heads. Extensive experiments validate our method qualitatively and across different metrics.
View details
Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis
Winter Conference on Applications of Computer Vision 2024 (2024) (to appear)
Preview abstract
We propose Hierarchical Text Spotter (HTS), the first method for the joint task of word-level text spotting and geometric layout analysis.
HTS can annotate text in images with a hierarchical representation of 4 levels: character, word, line, and paragraph.
The proposed HTS is characterized by two novel components:
(1) a Unified-Detector-Polygon (UDP) that produces Bezier Curve polygons of text lines and an affinity matrix for paragraph grouping between detected lines;
(2) a Line-to-Character-to-Word (L2C2W) recognizer that splits lines into characters and further merges them back into words.
HTS achieves state-of-the-art results on multiple word-level text spotting benchmark datasets as well as geometric layout analysis tasks.
Code will be released upon acceptance.
View details
Analyzing Prospects for Quantum Advantage in Topological Data Analysis
Dominic W. Berry
Yuan Su
Casper Gyurik
Robbie King
Joao Basso
Abhishek Rajput
Nathan Wiebe
Vedran Djunko
PRX Quantum, 5 (2024), pp. 010319
Preview abstract
Lloyd et al. were first to demonstrate the promise of quantum algorithms for computing Betti numbers in persistent homology (a way of characterizing topological features of data sets). Here, we propose, analyze, and optimize an improved quantum algorithm for topological data analysis (TDA) with reduced scaling, including a method for preparing Dicke states based on inequality testing, a more efficient amplitude estimation algorithm using Kaiser windows, and an optimal implementation of eigenvalue projectors based on Chebyshev polynomials. We compile our approach to a fault-tolerant gate set and estimate constant factors in the Toffoli complexity. Our analysis reveals that super-quadratic quantum speedups are only possible for this problem when targeting a multiplicative error approximation and the Betti number grows asymptotically. Further, we propose a dequantization of the quantum TDA algorithm that shows that having exponentially large dimension and Betti number are necessary, but insufficient conditions, for super-polynomial advantage. We then introduce and analyze specific problem examples for which super-polynomial advantages may be achieved, and argue that quantum circuits with tens of billions of Toffoli gates can solve some seemingly classically intractable instances.
View details
50 Shades of Support: A Device-Centric Analysis of Android Security Updates
Abbas Acar
Esteban Luques
Harun Oz
Ahmet Aris
Selcuk Uluagac
Network and Distributed System Security (NDSS) Symposium (2024)
Preview abstract
Android is by far the most popular OS with over
three billion active mobile devices. As in any software, uncovering
vulnerabilities on Android devices and applying timely patches
are both critical. Android Open Source Project (AOSP) has
initiated efforts to improve the traceability of security updates
through Security Patch Levels (SPLs) assigned to devices. While
this initiative provided better traceability for the vulnerabilities,
it has not entirely resolved the issues related to the timeliness
and availability of security updates for end users. Recent studies
on Android security updates have focused on the issue of delay
during the security update roll-out, largely attributing this to
factors related to fragmentation. However, these studies fail to
capture the entire Android ecosystem as they primarily examine
flagship devices or do not paint a comprehensive picture of the
Android devices’ lifecycle due to the datasets spanning over a
short timeframe. To address this gap in the literature, we utilize
a device-centric approach to analyze the security update behavior
of Android devices. Our approach aims to understand the security
update distribution behavior of OEMs (e.g., Samsung) by using
a representative set of devices from each OEM and characterize
the complete lifecycle of an average Android device. We obtained
367K official security update records from public sources, span-
ning from 2014 to 2023. Our dataset contains 599 unique devices
from four major OEMs that are used in 97 countries and are
associated with 109 carriers. We identify significant differences
in the roll-out of security updates across different OEMs, device
models/types, and geographical regions across the world. Our
findings show that the reasons for the delay in the roll-out of
security updates are not limited to fragmentation but also involve
OEM-specific factors. Our analysis also uncovers certain key
issues that can be readily addressed as well as exemplary practices
that can be immediately adopted by OEMs in practice.
View details
MetaMix: Meta-state Precision Searcher for Mixed-precision Activation Quantization
Han-Byul Kim
Joo Hyung Lee
Sungjoo Yoo
Hong-Seok Kim
Proc. The 38th Annual AAAI Conference on Artificial Intelligence (AAAI) (2024)
Preview abstract
Mixed-precision quantization of efficient networks often suffer from activation instability encountered in the exploration of bit selections. To address this problem, we propose a novel method called MetaMix which consists of bit selection and weight training phases. The bit selection phase iterates two steps, (1) the mixed-precision-aware weight update, and (2) the bit-search training with the fixed mixed-precision-aware weights, both of which combined reduce activation instability in mixed-precision quantization and contribute to fast and high-quality bit selection. The weight training phase exploits the weights and step sizes trained in the bit selection phase and fine-tunes them thereby offering fast training. Our experiments with efficient and hard-to-quantize networks, i.e., MobileNet v2 and v3, and ResNet-18 on ImageNet show that our proposed method pushes the boundary of mixed-precision quantization, in terms of accuracy vs. operations, by outperforming both mixed- and single-precision SOTA methods.
View details
SPHEAR: Spherical Head Registration for Complete Statistical 3D Modeling
Andrei Zanfir
Teodor Szente
Mihai Zanfir
International Conference on 3D Vision (2024)
Preview abstract
We present SPHEAR, an accurate, differentiable parametric statistical 3D human head model, enabled by a novel 3D registration method based on spherical embeddings. We shift the paradigm away from the classical Non-Rigid Registration methods, which operate under various surface priors, increasing reconstruction fidelity and minimizing required human intervention. Additionally, SPHEAR is a complete model that allows not only to sample diverse synthetic head shapes and facial expressions, but also gaze directions, high-resolution color textures, surface normal maps, and hair cuts represented in detail, as strands. SPHEAR can be used for automatic realistic visual data generation, semantic annotation, and general reconstruction tasks. Compared to state-of-the-art approaches, our components are fast and memory efficient, and experiments support the validity of our design choices and the accuracy of registration, reconstruction and generation techniques.
View details
TextMesh: Generation of Realistic 3D Meshes From Text Prompts
Christina Tsalicoglou
Fabian Manhardt
Michael Niemeyer
3DV 2024 (2024)
Preview abstract
The ability to generate highly realistic 2D images from mere text prompts has recently made huge progress in terms of speed and quality, thanks to the advent of image diffusion models. Naturally, the question arises if this can be also achieved in the generation of 3D content from such text prompts. To this end, a new line of methods recently emerged trying to harness diffusion models, trained on 2D images, for supervision of 3D model generation using view dependent prompts. While achieving impressive results, these methods, however, have two major drawbacks. First, rather than commonly used 3D meshes, they instead generate neural radiance fields (NeRFs), making them impractical for most real applications. Second, these approaches tend to produce over-saturated models, giving the output a cartoonish looking effect. Therefore, in this work we propose a novel method for generation of highly realistic-looking 3D meshes. To this end, we extend NeRF to employ an SDF backbone, leading to improved 3D mesh extraction. In addition, we propose a novel way to finetune the mesh texture, removing the effect of high saturation and improving the details of the output 3D mesh.
View details
Preview abstract
We extend conformal prediction to control the expected value of any monotone loss function. The
algorithm generalizes split conformal prediction together with its coverage guarantee. Like conformal
prediction, the conformal risk control procedure is tight up to an O(1/n) factor. Worked examples from
computer vision and natural language processing demonstrate the usage of our algorithm to bound the
false negative rate, graph distance, and token-level F1-score.
View details
LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D Signals
Arjun Karpur
Guilherme Perrotta
Ricardo Martin-Brualla
Proc. 3DV'24 (2024) (to appear)
Preview abstract
Finding localized correspondences across different images of the same object is crucial to understand its geometry. In recent years, this problem has seen remarkable progress with the advent of deep learning-based local image features and learnable matchers. Still, learnable matchers often underperform when there exists only small regions of co-visibility between image pairs (i.e. wide camera baselines). To address this problem, we leverage recent progress in coarse single-view geometry estimation methods. We propose LFM-3D, a Learnable Feature Matching framework that uses models based on graph neural networks and enhances their capabilities by integrating noisy, estimated 3D signals to boost correspondence estimation. When integrating 3D signals into the matcher model, we show that a suitable positional encoding is critical to effectively make use of the low-dimensional 3D information. We experiment with two different 3D signals - normalized object coordinates and monocular depth estimates - and evaluate our method on large-scale (synthetic and real) datasets containing object-centric image pairs across wide baselines. We observe strong feature matching improvements compared to 2D-only methods, with up to +6% total recall and +28% precision at fixed recall. Additionally, we demonstrate that the resulting improved correspondences lead to much higher relative posing accuracy for in-the-wild image pairs - up to 8.6% compared to the 2D-only approach.
View details
Wear's my Data? Understanding the Cross-Device Runtime Permission Model in Wearables
Doguhan Yeke
Muhammad Ibrahim
Habiba Farukh
Abdullah Imran
Antonio Bianchi
Z. Berkay Celik
IEEE Symposium on Security and Privacy (2024) (to appear)
Preview abstract
Wearable devices are becoming increasingly important, helping us stay healthy and connected. There are a variety
of app-based wearable platforms that can be used to manage
these devices. The apps on wearable devices often work with a
companion app on users’ smartphones. The wearable device and
the smartphone typically use two separate permission models
that work synchronously to protect sensitive data. However, this
design creates an opaque view of the management of permission-
protected data, resulting in over-privileged data access without
the user’s explicit consent. In this paper, we performed the first
systematic analysis of the interaction between the Android and
Wear OS permission models. Our analysis is two-fold. First,
through taint analysis, we showed that cross-device flows of
permission-protected data happen in the wild, demonstrating
that 28 apps (out of the 150 we studied) on Google Play
have sensitive data flows between the wearable app and its
companion app. We found that these data flows occur without
the users’ explicit consent, introducing the risk of violating
user expectations. Second, we conducted an in-lab user study
to assess users’ understanding of permissions when subject to
cross-device communication (n = 63). We found that 66.7% of
the users are unaware of the possibility of cross-device sensitive
data flows, which impairs their understanding of permissions in
the context of wearable devices and puts their sensitive data at
risk. We also showed that users are vulnerable to a new class of
attacks that we call cross-device permission phishing attacks on
wearable devices. Lastly, we performed a preliminary study on
other watch platforms (i.e., Apple’s watchOS, Fitbit, Garmin
OS) and found that all these platforms suffer from similar
privacy issues. As countermeasures for the potential privacy
violations in cross-device apps, we suggest improvements in the
system prompts and the permission model to enable users to
make better-informed decisions, as well as on app markets to
identify malicious cross-device data flows.
View details
Preview abstract
Sequence labeling is a core task in text understanding for IE/IR systems. Text generation models have increasingly become the go-to solution for such tasks (e.g., entity extraction and dialog slot filling). While most research has focused on the labeling accuracy, a key aspect -- of vital practical importance -- has slipped through the cracks: understanding model confidence. More specifically, we lack a principled understanding of how to reliably gauge the confidence of a model in its predictions for each labeled span. This paper aims to provide some empirical insights on estimating model confidence for generative sequence labeling. Most notably, we find that simply using the decoder's output probabilities is not the best in realizing well-calibrated confidence estimates. As verified over six public datasets of different tasks, we show that our proposed approach -- which leverages statistics from top-k predictions by a beam search -- significantly reduces calibration errors of the predictions of a generative sequence labeling model.
View details
Drug Design on Quantum Computers
Raffaele Santagati
Alán Aspuru-Guzik
Matthias Degroote
Leticia Gonzalez
Elica Kyoseva
Nikolaj Moll
Markus Oppel
Robert Parrish
Michael Streif
Christofer Tautermann
Horst Weiss
Nathan Wiebe
Clemens Utschig-Utschig
Nature Physics (2024)
Preview abstract
The promised industrial applications of quantum computers often rest on their anticipated ability to perform accurate, efficient quantum chemical calculations. Computational drug discovery relies on accurate predictions of how candidate drugs interact with their targets in a cellular environment involving several thousands of atoms at finite temperatures. Although quantum computers are still far from being used as daily tools in the pharmaceutical industry, here we explore the challenges and opportunities of applying quantum computers to drug design. We discuss where these could transform industrial research and identify the substantial further developments needed to reach this goal.
View details
Quantum Computation of Stopping power for Inertial Fusion Target Design
Dominic Berry
Alina Kononov
Alec White
Joonho Lee
Andrew Baczewski
Proceedings of the National Academy of Sciences, 121 (2024), e2317772121
Preview abstract
Stopping power is the rate at which a material absorbs the kinetic energy of a charged particle passing through it - one of many properties needed over a wide range of thermodynamic conditions in modeling inertial fusion implosions. First-principles stopping calculations are classically challenging because they involve the dynamics of large electronic systems far from equilibrium, with accuracies that are particularly difficult to constrain and assess in the warm-dense conditions preceding ignition. Here, we describe a protocol for using a fault-tolerant quantum computer to calculate stopping power from a first-quantized representation of the electrons and projectile. Our approach builds upon the electronic structure block encodings of Su et al. [PRX Quantum 2, 040332 2021], adapting and optimizing those algorithms to estimate observables of interest from the non-Born-Oppenheimer dynamics of multiple particle species at finite temperature. We also work out the constant factors associated with a novel implementation of a high order Trotter approach to simulating a grid representation of these systems. Ultimately, we report logical qubit requirements and leading-order Toffoli costs for computing the stopping power of various projectile/target combinations relevant to interpreting and designing inertial fusion experiments. We estimate that scientifically interesting and classically intractable stopping power calculations can be quantum simulated with
roughly the same number of logical qubits and about one hundred times more Toffoli gates than is required for state-of-the-art quantum simulations of industrially relevant molecules such as FeMoCo or P450.
View details
Using Early Readouts to Mediate Featural Bias in Distillation
Rishabh Tiwari
Durga Sivasubramanian
Anmol Mekala
Ganesh Ramakrishnan
WACV 2024 (2024)
Preview abstract
Deep networks tend to learn spurious feature-label correlations in real-world supervised learning tasks. This vulnerability is aggravated in distillation, where a (student) model may have less representational capacity than the corresponding teacher model. Often, knowledge of specific problem features is used to reweight instances & rebalance the learning process. We propose a novel early readout mechanism whereby we attempt to predict the label using representations from earlier network layers. We show that these early readouts automatically identify problem instances or groups in the form of confident, incorrect predictions. We improve group fairness measures across benchmark datasets by leveraging these signals to mediate between teacher logits and supervised label. We extend our results to the closely related but distinct problem of domain generalization, which also critically depends on the quality of learned features. We provide secondary analyses that bring insight into the role of feature learning in supervision and distillation.
View details
Towards Generalist Biomedical AI
Danny Driess
Andrew Carroll
Chuck Lau
Ryutaro Tanno
Ira Ktena
Anil Palepu
Basil Mustafa
Aakanksha Chowdhery
Simon Kornblith
Philip Mansfield
Sushant Prakash
Renee Wong
Sunny Virmani
Sara Mahdavi
Bradley Green
Ewa Dominowska
Joelle Barral
Karan Singhal
Pete Florence
NEJM AI (2024)
Preview abstract
BACKGROUND: Medicine is inherently multimodal, requiring the simultaneous interpretation and integration of insights between many data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence systems that flexibly encode, integrate, and interpret these data might better enable impactful applications ranging from scientific discovery to care delivery.
METHODS: To catalyze development of these models, we curated MultiMedBench, a new multimodal biomedical benchmark. MultiMedBench encompasses 14 diverse tasks, such as medical question answering, mammography and dermatology image interpretation, radiology report generation and summarization, and genomic variant calling. We then introduced Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same set of model weights. To further probe the capabilities and limitations of Med-PaLM M, we conducted a radiologist evaluation of model-generated (and human) chest x-ray reports.
RESULTS: We observed encouraging performance across model scales. Med-PaLM M reached performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin. In a side-by-side ranking on 246 retrospective chest x-rays, clinicians expressed a pairwise preference for Med-PaLM Multimodal reports over those produced by radiologists in up to 40.50% of cases, suggesting potential clinical utility.
CONCLUSIONS: Although considerable work is needed to validate these models in real-world cases and understand if cross-modality generalization is possible, our results represent a milestone toward the development of generalist biomedical artificial intelligence systems.
View details