Berkin Akin

Berkin Akin

I am a Software Engineer at Google DeepMind where I focus on compute performance for large-scale AI models deployed on TPUs. Prior to joining Google in 2019, I was a research scientist at Intel Labs. Before that I got my PhD from Carnegie Mellon University.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs
    Anton Spiridonov
    Hao Xu
    Marie Charisse White
    Ping Zhou
    Suyog Gupta
    Yun Long
    Zhuo Wang
    IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2022)
    Preview abstract On-device ML accelerators are becoming a standard in modern mobile system-on-chips (SoC). Neural architecture search (NAS) comes to the rescue for efficiently utilizing the high compute throughput offered by these accelerators. However, existing NAS frameworks have several practical limitations in scaling to multiple tasks and different target platforms. In this work, we provide a two-pronged approach to this challenge: (i) a NAS-enabling infrastructure that decouples model cost evaluation, search space design, and the NAS algorithm to rapidly target various on-device ML tasks, and (ii) search spaces crafted from group convolution based inverted bottleneck (IBN) variants that provide flexible quality/performance trade-offs on ML accelerators, complementing the existing full and depthwise convolution based IBNs. Using this approach we target a state-of-the-art mobile platform, Google Tensor SoC, and demonstrate neural architectures that improve the quality-performance pareto frontier for various computer vision (classification, detection, segmentation) as well as natural language processing tasks. View details
    Preview abstract Edge TPUs are a domain of accelerators for low-power,edge devices and are widely used in various Google productssuch as Coral devices and Pixel 4. In this paper, we first discussthe major microarchitectural details of Edge TPUs. Then, weextensively evaluate three classes of Edge TPUs, covering bothdata-center and mobile-SoC ecosystems, that are used or inthe pipeline to be used in Google products across 423K uniqueconvolutional neural networks. Building upon this extensive study,we discuss critical and interpretable microarchitectural insightsabout the studied classes of Edge TPUs. Finally, we present ourundergoing efforts in developing high-accuracy learned machinelearning models to estimate the major performance metrics ofEdge TPU accelerators. These learned models enable significantlyfaster (in the order of milliseconds) evaluations of acceleratorsas alternative to time-consuming cycle-accurate simulators andestablish an exciting opportunity for rapid hardware/softwareco-design. View details
    Accelerator-aware Neural Network Design using AutoML
    Suyog Gupta
    On-device Intelligence Workshop at MLSys 2020 (2020)
    Preview abstract While neural network hardware accelerators provide a substantial amount of raw compute throughput, the models deployed on them must be co-designed for the underlying hardware architecture to obtain the optimal system performance. We present a class of computer vision models designed using hardware-aware neural architecture search and customized to run on the Edge TPU, Google's neural network hardware accelerator for low-power, edge devices. For the Edge TPU in Coral devices, these models enable real-time image classification performance while achieving accuracy typically seen only with larger, compute-heavy models running in data centers. On Pixel 4's Edge TPU, these models improve the accuracy-latency tradeoff over existing SoTA mobile models. View details
    Preview abstract The looming end of Moore's Law and ascending use of deep learning drives the design of custom accelerators that are optimized for specific neural architectures. Accelerator design forms a challenging constrained optimization problem over a complex, high-dimensional and structured input space with a costly to evaluate objective function. Existing approaches for accelerator design are sample-inefficient do not transfer knowledge between related optimizations tasks with different design constraints (e.g. area budget) or neural architecture configurations. In this work, we propose a transferable architecture exploration framework, dubbed Apollo, that leverages recent advances in black-box function optimization for sample-efficient accelerator design. We use Apollo to optimize accelerator configurations of a diverse set of neural architectures with alternative design constraints. We show that Apollo finds optimal design configurations more sample-efficiently than baseline approaches. We further show that transferring knowledge between target architectures with different design constraints helps to find optimal configurations faster. This encouraging outcome portrays a promising path forward in shortening the timeline for accelerator design. View details