Marco Fornoni
Marco Fornoni is a Staff Engineer in Google DeepMind. His research interests are in the field of AI, machine learning, and computer vision. His current focus is on efficient vision and language architectures research.
He joined Google as part of the acquisition of Moodstocks, where he worked as a Research Engineer. He holds a PhD from EPFL and Idiap Research Institute, with a thesis on visual scene recognition.
Research Areas
Authored Publications
Sort By
On Label Granularity and Object Localization
Elijah Henry John Cole
Kimberly Wilber
Grant Van Horn
Xuan Yang
Pietro Perona
Serge Belongie
Andrew Howard
Mac Aodha, Oisin
European Conference on Computer Vision, Springer (2022), pp. 604-620
Preview abstract
Weakly supervised object localization (WSOL) aims to learn representations that encode object location using only image-level category labels. However, many objects can be labeled at different levels of granularity. Is it an animal, a bird, or a great horned owl? Which image-level labels should we use? In this paper we study the role of label granularity in WSOL. To facilitate this investigation we introduce iNatLoc500, a new large-scale fine-grained benchmark dataset for WSOL. Surprisingly, we find that choosing the right training label granularity provides a much larger performance boost than choosing the best WSOL algorithm. We also show that changing the label granularity can significantly improve data efficiency.
View details
Bridging the Gap between Object Detection and User Intent via Query-Modulation
Chaochao Yan
Kimberly Wilber
Alex Stark
Yin Cui
Boqing Gong
Andrew Howard
arXiv (2021)
Preview abstract
When interacting with objects through cameras, or pictures, users often have a specific intent. For example, they may want to perform a visual search.
With most object detection models relying on image pixels as their sole input, undesired results are not uncommon. Most typically: lack of a high-confidence detection on the object of interest, or detection with a wrong class label. The issue is especially severe when operating capacity-constrained mobile object detectors on-device.
In this paper we investigate techniques to modulate mobile detectors to explicitly account for the user intent, expressed as an embedding of a simple query.
Compared to standard detectors, query-modulated detectors show superior performance at detecting objects for a given user query. Thanks to large-scale training data synthesized from standard object detection annotations, query-modulated detectors also outperform a specialized referring expression recognition system. Query-modulated detectors can also be trained to simultaneously solve for both localizing a user query and standard detection, even outperforming standard mobile detectors at the canonical COCO task.
View details
SpotPatch: Parameter-Efficient Transfer Learning for Mobile Object Detection
Keren Ye
Adriana Kovashka
Menglong Zhu
Andrew Howard
Proceedings of the Asian Conference on Computer Vision (ACCV), Springer (2020)
Preview abstract
Deep learning based object detectors are commonly deployed on mobile devices to solve a variety of tasks. For maximum accuracy, each detector is usually trained to solve one single specific task, and comes with a completely independent set of parameters. While this guarantees high performance, it is also highly inefficient, as each model has to be separately downloaded and stored. In this paper we address the question: can task-specific detectors be trained and represented as a shared set of weights, plus a very small set of additional weights for each task? The main contributions of this paper are the following: 1) we perform the first systematic study of parameter-efficient transfer learning techniques for object detection problems; 2) we propose a technique to learn a model patch with a size that is dependent on the difficulty of the task to be learned, and validate our approach on 10 different object detection tasks. Our approach achieves similar accuracy as previously proposed approaches, while being significantly more compact.
View details