AutoAI2C: An Automated Hardware Generator for DNN Acceleration on both FPGA and ASIC
Abstract
Recent advancements in Deep Neural Networks (DNNs) and the slowing of Moore’s law have made domainspecific hardware accelerators for DNNs (i.e., DNN chips) a promising means for enabling more extensive DNN applications. However, designing DNN chips is challenging due to (1) the vast and non-standardized design space and (2) different DNN models’ varying performance preferences regarding hardware micro-architecture and dataflows. Therefore, designing a DNN chip often takes a large team of inter-disciplinary experts months to years. To enable flexible and efficient DNN chip design, we propose AutoAI2C: a DNN chip generator that can automatically generate both FPGA- and ASIC-based DNN accelerator implementation (i.e., synthesizable hardware and deployment code) with optimized algorithm-to-hardware mapping, given the DNN model specification from mainstream machine learning frameworks (e.g., PyTorch). Specifically, AutoAI2C consists of two major components: (1) a Chip Predictor, which can efficiently and reliably predict a DNN accelerator’s energy, latency, and resource
consumption using a customized graph-based intermediate accelerator representation and (2) a Chip Builder, which can generate and optimize DNN accelerator designs by automatically exploring the design space based on targeting metrics and the Chip Predictor’s performance feedback. Extensive experiments show that our Chip Predictor’s predictions differ by <10% from realmeasured ones. Furthermore, AutoAI2C generated accelerators can achieve performance comparable to or better than (1.10× to 2.12× speedup) state-of-the-art accelerators, validating the effectiveness and advantages of AutoAI2C
consumption using a customized graph-based intermediate accelerator representation and (2) a Chip Builder, which can generate and optimize DNN accelerator designs by automatically exploring the design space based on targeting metrics and the Chip Predictor’s performance feedback. Extensive experiments show that our Chip Predictor’s predictions differ by <10% from realmeasured ones. Furthermore, AutoAI2C generated accelerators can achieve performance comparable to or better than (1.10× to 2.12× speedup) state-of-the-art accelerators, validating the effectiveness and advantages of AutoAI2C