Publications
Publishing our work allows us to share ideas and work collaboratively to advance the field of computer science.
Publications
Publishing our work allows us to share ideas and work collaboratively to advance the field of computer science.
Sort By
1 - 15 of 15136 publications
Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis
Winter Conference on Applications of Computer Vision 2024 (2024) (to appear)
Preview abstract
We propose Hierarchical Text Spotter (HTS), the first method for the joint task of word-level text spotting and geometric layout analysis.
HTS can annotate text in images with a hierarchical representation of 4 levels: character, word, line, and paragraph.
The proposed HTS is characterized by two novel components:
(1) a Unified-Detector-Polygon (UDP) that produces Bezier Curve polygons of text lines and an affinity matrix for paragraph grouping between detected lines;
(2) a Line-to-Character-to-Word (L2C2W) recognizer that splits lines into characters and further merges them back into words.
HTS achieves state-of-the-art results on multiple word-level text spotting benchmark datasets as well as geometric layout analysis tasks.
Code will be released upon acceptance.
View details
TextMesh: Generation of Realistic 3D Meshes From Text Prompts
Michael Niemeyer
Christina Tsalicoglou
Fabian Manhardt
3DV 2024 (2024)
Preview abstract
The ability to generate highly realistic 2D images from mere text prompts has recently made huge progress in terms of speed and quality, thanks to the advent of image diffusion models. Naturally, the question arises if this can be also achieved in the generation of 3D content from such text prompts. To this end, a new line of methods recently emerged trying to harness diffusion models, trained on 2D images, for supervision of 3D model generation using view dependent prompts. While achieving impressive results, these methods, however, have two major drawbacks. First, rather than commonly used 3D meshes, they instead generate neural radiance fields (NeRFs), making them impractical for most real applications. Second, these approaches tend to produce over-saturated models, giving the output a cartoonish looking effect. Therefore, in this work we propose a novel method for generation of highly realistic-looking 3D meshes. To this end, we extend NeRF to employ an SDF backbone, leading to improved 3D mesh extraction. In addition, we propose a novel way to finetune the mesh texture, removing the effect of high saturation and improving the details of the output 3D mesh.
View details
Preview abstract
We present PhoMoH, a neural network methodology to construct generative models of photo-realistic 3D geometry and appearance of human heads including hair, beards, an oral cavity, and clothing. In contrast to prior work, PhoMoH models the human head using neural fields, thus supporting complex topology. Instead of learning a head model from scratch, we propose to augment an existing expressive head model with new features. Concretely, we learn a highly detailed geometry network layered on top of a mid-resolution head model together with a detailed, local geometry-aware, and disentangled color field. Our proposed architecture allows us to learn photo-realistic human head models from relatively little data. The learned generative geometry and appearance networks can be sampled individually and enable the creation of diverse and realistic human heads. Extensive experiments validate our method qualitatively and across different metrics.
View details
Wear's my Data? Understanding the Cross-Device Runtime Permission Model in Wearables
Doguhan Yeke
Muhammad Ibrahim
Habiba Farukh
Abdullah Imran
Antonio Bianchi
Z. Berkay Celik
IEEE Security and Privacy (2024) (to appear)
Preview abstract
Google’s Wear OS is an Android version designed to manage wearable devices. The apps running on these wearable devices often work in conjunction with a "companion" app running on an Android smartphone. Currently, the wearable device and the smartphone use two separate run-time permission models. This situation creates an opaque view of permission-required data management, resulting in over-privileged data access without the user’s explicit consent. To address this issue, we performed the first systematic analysis of the interaction between Android and Wear OS permission models. Our analysis is two-fold. First, we show if and how permission-protected data flows occur between the Wear OS app and the companion app via static taint analysis, quantifying the data flows on 150 real-world wearable apps. Our taint analysis revealed 28 apps with sensitive data flows between the Wear OS app and its companion app. These data flows occur without the users’ explicit consent, thereby introducing the risk of unintended data flows. Second, to uncover users’ understanding of these data flows, we conducted an in-lab user study (n = 63), answering, are users aware of which device can access which data? We found that 66.7% of the users are unaware of the unintended data flows and have a limited understanding of the runtime permission model in general, putting their sensitive data at risk. To mitigate the potential privacy violations in the runtime permission model on cross-device apps, we suggest improvements in system prompts to enable users to make better-informed decisions.
View details
50 Shades of Support: A Device-Centric Analysis of Android Security Updates
Abbas Acar
Esteban Luques
Harun Oz
Ahmet Aris
Selcuk Uluagac
Network and Distributed System Security (NDSS) Symposium (2024) (to appear)
Preview abstract
Android is by far the most popular OS with over
three billion active mobile devices. As in any software, uncovering
vulnerabilities on Android devices and applying timely patches
are both critical. Android Open Source Project (AOSP) has
initiated efforts to improve the traceability of security updates
through Security Patch Levels (SPLs) assigned to devices. While
this initiative provided better traceability for the vulnerabilities,
it has not entirely resolved the issues related to the timeliness
and availability of security updates for end users. Recent studies
on Android security updates have focused on the issue of delay
during the security update roll-out, largely attributing this to
factors related to fragmentation. However, these studies fail to
capture the entire Android ecosystem as they primarily examine
flagship devices or do not paint a comprehensive picture of the
Android devices’ lifecycle due to the datasets spanning over a
short timeframe. To address this gap in the literature, we utilize
a device-centric approach to analyze the security update behavior
of Android devices. Our approach aims to understand the security
update distribution behavior of OEMs (e.g., Samsung) by using
a representative set of devices from each OEM and characterize
the complete lifecycle of an average Android device. We obtained
367K official security update records from public sources, span-
ning from 2014 to 2023. Our dataset contains 599 unique devices
from four major OEMs that are used in 97 countries and are
associated with 109 carriers. We identify significant differences
in the roll-out of security updates across different OEMs, device
models/types, and geographical regions across the world. Our
findings show that the reasons for the delay in the roll-out of
security updates are not limited to fragmentation but also involve
OEM-specific factors. Our analysis also uncovers certain key
issues that can be readily addressed as well as exemplary practices
that can be immediately adopted by OEMs in practice.
View details
Using Early Readouts to Mediate Featural Bias in Distillation
Durga Sivasubramanian
Ganesh Ramakrishnan
Anmol Mekala
WACV 2024 (2024)
Preview abstract
Deep networks tend to learn spurious feature-label correlations in real-world supervised learning tasks. This vulnerability is aggravated in distillation, where a (student) model may have less representational capacity than the corresponding teacher model. Often, knowledge of specific problem features is used to reweight instances & rebalance the learning process. We propose a novel early readout mechanism whereby we attempt to predict the label using representations from earlier network layers. We show that these early readouts automatically identify problem instances or groups in the form of confident, incorrect predictions. We improve group fairness measures across benchmark datasets by leveraging these signals to mediate between teacher logits and supervised label. We extend our results to the closely related but distinct problem of domain generalization, which also critically depends on the quality of learned features. We provide secondary analyses that bring insight into the role of feature learning in supervision and distillation.
View details
LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D Signals
Ricardo Martin-Brualla
Arjun Karpur
Guilherme Perrotta
Proc. CVPR'23 (in submission) (2024) (to appear)
Preview abstract
Finding localized correspondences across different images of the same object is crucial to understand its geometry. In recent years, this problem has seen remarkable progress with the advent of deep learning-based local image features and learnable matchers. Still, learnable matchers often underperform when there exists only small regions of co-visibility between image pairs (i.e. wide camera baselines). To address this problem, we leverage recent progress in coarse single-view geometry estimation methods. We propose LFM-3D, a Learnable Feature Matching framework that uses models based on graph neural networks and enhances their capabilities by integrating noisy, estimated 3D signals to boost correspondence estimation. When integrating 3D signals into the matcher model, we show that a suitable positional encoding is critical to effectively make use of the low-dimensional 3D information. We experiment with two different 3D signals - normalized object coordinates and monocular depth estimates - and evaluate our method on large-scale (synthetic and real) datasets containing object-centric image pairs across wide baselines. We observe strong feature matching improvements compared to 2D-only methods, with up to +6% total recall and +28% precision at fixed recall. Additionally, we demonstrate that the resulting improved correspondences lead to much higher relative posing accuracy for in-the-wild image pairs - up to 8.6% compared to the 2D-only approach.
View details
SPHEAR: Spherical Head Registration for Complete Statistical 3D Modeling
Andrei Zanfir
Mihai Zanfir
Teodor Szente
International Conference on 3D Vision (2024)
Preview abstract
We present \emph{SPHEAR}, an accurate, differentiable parametric statistical 3D human head model, enabled by a novel 3D registration method based on spherical embeddings. We shift the paradigm away from the classical Non-Rigid Registration methods, which operate under various surface priors, increasing reconstruction fidelity and minimizing required human intervention. Additionally, SPHEAR is a \emph{complete} model that allows not only to sample diverse synthetic head shapes and facial expressions, but also gaze directions, high-resolution color textures, surface normal maps, and hair cuts represented in detail, as strands. SPHEAR can be used for automatic realistic visual data generation, semantic annotation, and general reconstruction tasks. Compared to state-of-the-art approaches, our components are fast and memory efficient, and experiments support the validity of our design choices and the accuracy of registration, reconstruction and generation techniques.
View details
Improved Inapproximability of VC Dimension and Littlestone’s Dimension via (Unbalanced) Biclique
ITCS 2023 (to appear)
Preview abstract
We study the complexity of computing (and approximating) VC Dimension and Littlestone's Dimension when we are given the concept class explicitly. We give a simple reduction from Maximum (Unbalanced) Biclique problem to approximating VC Dimension and Littlestone's Dimension. With this connection, we derive a range of hardness of approximation results and running time lower bounds. For example, under the (randomized) Gap-Exponential Time Hypothesis or the Strongish Planted Clique Hypothesis, we show a tight inapproximability result: both dimensions are hard to approximate to within a factor of o(log n) in polynomial-time. These improve upon constant-factor inapproximability results from [Manurangsi and Rubinstein, COLT 2017].
View details
Preview abstract
Contrastive learning is a powerful framework for learning self-supervised representations that generalize well to downstream supervised tasks.
We show that multiple existing contrastive learning methods can be reinterpeted as learning kernel functions that approximate a fixed positive-pair kernel.
We then prove that a simple representation obtained by combining this kernel with PCA provably minimizes the worst-case approximation error of linear predictors, under a straightforward assumption that positive pairs have similar labels.
Our analysis is based on a decomposition of the target function in terms of the eigenfunctions of a positive-pair Markov chain, and a surprising equivalence between these eigenfunctions and the output of Kernel PCA.
We give generalization bounds for downstream linear prediction using our kernel PCA representation, and show empirically on a set of synthetic tasks that applying kernel PCA to contrastive learning models can indeed approximately recover the Markov chain eigenfunctions, although the accuracy depends on the kernel parameterization as well as on the augmentation strength.
View details
User Attitudes Towards Controls for Ad Interests Estimated On-device by the Browser
Yuan Chen
Theodore Olsauskas-Warren
Symposium on Usable Security and Privacy (USEC) (2023)
Preview abstract
Online behavioral advertising is a double-edged sword. While relevant display ads are generally considered useful, opaque tracking based on third-party cookies has reached unfettered sprawl and is deemed to be privacy-intrusive. However, existing ways to preserve privacy do not sufficiently balance the needs of both users and the ecosystem. In this work, we evaluate alternative browser controls. We leverage the idea of inferring interests on users’ devices and designed novel browser controls to manage these interests. Through a mixed method approach, we studied how users feel about this approach. First, we conducted pilot interviews with 9 participants to test two design directions. Second, we ran a survey with 2,552 respondents to measure how our final design compares with current cookie settings. Respondents reported a significantly higher level of perceived privacy and feeling of control when introduced to the concept of locally inferred interests with an option for removal.
View details
Preview abstract
We provide an intuitive new algorithm for black-box stochastic optimization of unimodal functions, a function class that we observe empirically can capture hyperparameter-tuning loss surfaces. Our method's convergence guarantee automatically adapts to Lipschitz constants and other problem difficulty parameters, recovering and extending prior results. We complement our theoretical development with experimentally validation on hyperparameter tuning tasks.
View details
Preview abstract
We propose two novel approaches to address a critical problem of reach measurement across multiple media -- how to estimate the reach of an unobserved subset of buying groups (BGs) based on the observed reach of other subsets of BGs. Specifically, we propose a model-free approach and a model-based approach. The former provides a coarse estimate for the reach of any subset by leveraging the consistency among the reach of different subsets. Linear programming is used to capture the constraints of the reach consistency. This produces an upper and a lower bound for the reach of any subset. The latter provides a point estimate for the reach of any subset. The key idea behind the latter is to exploit the conditional independence model. In particular, the groups of the model are created by assuming each BG has either high or low reach probability in a group, and the weights of each group are determined through solving a non-negative least squares (NNLS) problem. In addition, we also provide a framework to give both confidence interval and point estimates by integrating these two approaches with training points selection and parameter fine-tuning through cross-validation. Finally, we evaluate the two approaches through experiments on synthetic data.
View details
NeRF-Supervised Deep Stereo
Fabio Tosi
Daniele De Gregorio
Matteo Poggi
Computer Vision and Pattern Recognition (2023)
Preview abstract
We introduce a novel framework for training deep stereo networks effortlessly and without any ground-truth. By leveraging state-of-the-art neural rendering solutions, we generate stereo training data from image sequences collected with a single handheld camera. On top of them, a NeRF-supervised training procedure is carried out, from which we exploit rendered stereo triplets to compensate for occlusions and depth maps as proxy labels. This results in stereo networks capable of predicting sharp and detailed disparity maps. Experimental results show that models trained under this regime yield a 30-40% improvement over existing self-supervised methods on the challenging Middlebury dataset, filling the gap to supervised models and, most times, outperforming them at zero-shot generalization.
View details
Robotic Table Tennis: A Case Study into a High Speed Learning System
Wenbo Gao
Navdeep Jaitly
Juhana Kangaspunta
Yuheng Kuang
Corey Lynch
Anish Shankar
Avi Singh
Grace Vesom
Peng Xu
Jon Abelian
Saminda Abeyruwan
Michael Ahn
Justin Boyd
Erwin Johan Coumans
Omar Escareno
Satoshi Kataoka
Gus Kouretas
Thinh Nguyen
Ken Oslund
Barney J. Reed
Robotics: Science and Systems (2023)
Preview abstract
We present a deep-dive into a learning robotic system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized and novel perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots. We complement a complete system description including numerous design decisions that are typically not widely disseminated, with a collection of ablation studies that clarify the importance of mitigating various sources of latency, accounting for training and deployment distribution shifts, robustness of the perception system, and sensitivity to policy hyper-parameters and choice of action space. A video demonstrating the components of our system and details of experimental results is included in the supplementary material.
View details