Over the past decade, he and his research team have developed several new compiler and runtime technologies for multicore processors and accelerators with a focus on the domains of dense matrix/tensor computations, stencil computations, and image processing pipelines. He is the original author and maintainer of Pluto, a source-to-source loop parallelization and optimization tool based on the polyhedral framework. The tool is widely used in the community for advanced experimentation with loop optimization, in optimization of scientific stencil code, and in university courses teaching loop transformations. A key recent loop transformation technique developed by his group is that of diamond tiling, an efficient way to tile stencil computations while ensuring concurrent start of tiles.
A significant amount of his research has also addressed the problem of parallelization and code generation for distributed-memory parallel architectures through several new techniques for construction of communication sets, data transformations, and dynamic scheduling. Another recent line of work from his team led to a high-performance domain-specific compiler, PolyMage, for the domain of image processing pipelines. PolyMage had the first automatic approach to optimize image processing pipelines through polyhedral tiling and fusion. He received a Google Faculty Research Award in 2015 for continued research and development on PolyMage.
At Google, he was a founding member of the MLIR project, with MLIR standing for either Multi-level Intermediate Representation or Machine Learning Intermediate Representation. MLIR was recently announced and open-sourced by Google. The MLIR project was initiated to deliver the next generation optimizing compiler infrastructure with a focus on serving the computational demands of AI and machine learning programming models. At Google itself, one of the project's goals is to address the compiler challenges associated with the TensorFlow ecosystem. MLIR is a new intermediate representation designed to provide a unified, modular, and extensible infrastructure to progressively lower dataflow compute graphs, through loop nests potentially, to high-performance target-specific code. MLIR shares similarities with traditional CFG-based three-address SSA representations (including LLVM IR or Swift intermediate language), but also introduces notions from the polyhedral compiler framework as first class concepts to allow powerful analysis and transformation in the presence of loop nests and multi-dimensional arrays. MLIR supports multiple front- and back-ends and uses LLVM IR as one of its primary code generation targets. It is thus a very useful infrastructure for developing new compilers, especially to solve the compilation challenges involved in targeting emerging AI and machine learning programming languages/models to the plethora of specialized accelerator chips.
After returning from his sabbatical at Google, he plans to use the MLIR infrastructure as the vehicle for his future research, and continue to collaborate with the Google Brain team.