Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation
Abstract
In this paper we describe a new mobile architecture MobileNetV2 that improves the state of the art performance of mobile models on multiple benchmarks across a spectrum of different model
sizes. MobileNetV2 is based on an inverted residual structure where the input and
output of the residual block are thin bottleneck layers, while the intermediate layer is an expanded representation that uses light weight depthwise convolutions to filter features. Additionally, we find that it is important to not
use non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves
performance and provide an intuition that led to this design.
Finally, our approach allows a decoupling of the input/output
domains from the expressiveness of the transformation, which
provides a convenient framework for further analysis.
We measure our performance on ImageNet \cite{Russakovsky:2015:ILS:2846547.2846559} classification, VOC image segmentation \cite{PASCAL} and COCO object detection \cite{COCO} datasets,
and evaluate the trade-offs between accuracy, and number of multiply adds,
and number of parameters
sizes. MobileNetV2 is based on an inverted residual structure where the input and
output of the residual block are thin bottleneck layers, while the intermediate layer is an expanded representation that uses light weight depthwise convolutions to filter features. Additionally, we find that it is important to not
use non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves
performance and provide an intuition that led to this design.
Finally, our approach allows a decoupling of the input/output
domains from the expressiveness of the transformation, which
provides a convenient framework for further analysis.
We measure our performance on ImageNet \cite{Russakovsky:2015:ILS:2846547.2846559} classification, VOC image segmentation \cite{PASCAL} and COCO object detection \cite{COCO} datasets,
and evaluate the trade-offs between accuracy, and number of multiply adds,
and number of parameters