Jump to Content
Marco Fornoni

Marco Fornoni

Marco Fornoni is a Software Engineer in Google Research. His research interests are in the field of machine learning and computer vision, and his current focus is on mobile models and architectures. He joined Google as part of the acquisition of Moodstocks, where he worked as a Research Engineer. He holds a PhD from EPFL and Idiap Research Institute, with a thesis on visual scene recognition.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    On Label Granularity and Object Localization
    Elijah Henry John Cole
    Kimberly Wilber
    Grant Van Horn
    Xuan Yang
    Pietro Perona
    Serge Belongie
    Andrew Howard
    Mac Aodha, Oisin
    European Conference on Computer Vision, Springer (2022), pp. 604-620
    Preview abstract Weakly supervised object localization (WSOL) aims to learn representations that encode object location using only image-level category labels. However, many objects can be labeled at different levels of granularity. Is it an animal, a bird, or a great horned owl? Which image-level labels should we use? In this paper we study the role of label granularity in WSOL. To facilitate this investigation we introduce iNatLoc500, a new large-scale fine-grained benchmark dataset for WSOL. Surprisingly, we find that choosing the right training label granularity provides a much larger performance boost than choosing the best WSOL algorithm. We also show that changing the label granularity can significantly improve data efficiency. View details
    Bridging the Gap between Object Detection and User Intent via Query-Modulation
    Chaochao Yan
    Liangchen Luo
    Kimberly Wilber
    Alex Stark
    Yin Cui
    Andrew Howard
    arXiv (2021)
    Preview abstract When interacting with objects through cameras, or pictures, users often have a specific intent. For example, they may want to perform a visual search. With most object detection models relying on image pixels as their sole input, undesired results are not uncommon. Most typically: lack of a high-confidence detection on the object of interest, or detection with a wrong class label. The issue is especially severe when operating capacity-constrained mobile object detectors on-device. In this paper we investigate techniques to modulate mobile detectors to explicitly account for the user intent, expressed as an embedding of a simple query. Compared to standard detectors, query-modulated detectors show superior performance at detecting objects for a given user query. Thanks to large-scale training data synthesized from standard object detection annotations, query-modulated detectors also outperform a specialized referring expression recognition system. Query-modulated detectors can also be trained to simultaneously solve for both localizing a user query and standard detection, even outperforming standard mobile detectors at the canonical COCO task. View details
    SpotPatch: Parameter-Efficient Transfer Learning for Mobile Object Detection
    Keren Ye
    Adriana Kovashka
    Menglong Zhu
    Andrew Howard
    Proceedings of the Asian Conference on Computer Vision (ACCV), Springer (2020)
    Preview abstract Deep learning based object detectors are commonly deployed on mobile devices to solve a variety of tasks. For maximum accuracy, each detector is usually trained to solve one single specific task, and comes with a completely independent set of parameters. While this guarantees high performance, it is also highly inefficient, as each model has to be separately downloaded and stored. In this paper we address the question: can task-specific detectors be trained and represented as a shared set of weights, plus a very small set of additional weights for each task? The main contributions of this paper are the following: 1) we perform the first systematic study of parameter-efficient transfer learning techniques for object detection problems; 2) we propose a technique to learn a model patch with a size that is dependent on the difficulty of the task to be learned, and validate our approach on 10 different object detection tasks. Our approach achieves similar accuracy as previously proposed approaches, while being significantly more compact. View details
    No Results Found