Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs

Anton Spiridonov; Berkin Akin; Hao Xu; Marie Charisse White; Ping Zhou; Suyog Gupta; Yanqi Zhou; Yun Long; Zhuo Wang

Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs

Anton Spiridonov

Berkin Akin

Hao Xu

Marie Charisse White

Ping Zhou

Suyog Gupta

Yanqi Zhou

Yun Long

Zhuo Wang

IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2022)

Google Scholar

Abstract

On-device ML accelerators are becoming a standard in modern mobile system-on-chips (SoC).
Neural architecture search (NAS) comes to the rescue for efficiently utilizing the high compute throughput offered by these accelerators. However, existing NAS frameworks have several practical limitations
in scaling to multiple tasks and different target platforms.
In this work, we provide a two-pronged approach to this challenge:
(i) a NAS-enabling infrastructure that decouples model cost evaluation, search space design, and the NAS algorithm to rapidly target various on-device ML tasks, and
(ii) search spaces crafted from group convolution based inverted bottleneck (IBN) variants that provide flexible quality/performance trade-offs on ML accelerators,
complementing the existing full and depthwise convolution based IBNs.
Using this approach we target a state-of-the-art mobile platform, Google Tensor SoC,
and demonstrate neural architectures that improve the quality-performance pareto frontier for various computer vision (classification, detection, segmentation) as well as natural language processing tasks.

Research Areas

Machine perception

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs