
Prakash Prabhu
I am a Software Engineer at Google working in the broad area of machine learning (ML) compilers for TPUs, with a focus on ML inference. I have worked on various pipeline parallelism methods to improve the performance of large transformer encoder models like BERT-Large, multi-query attention auto-regressive decoders across multiple TPUs, and more recently on sub-byte quantized models like Gemini Nano on Pixel Edge TPUs.
My interests include ML inference optimizations, distributed systems, parallel computing, program analysis & compiler optimizations. I received a PhD in Computer Science from Princeton University.
Authored Publications
Sort By
Google
Practical Performance Guarantees for Pipelined DNN Inference
Kuikui Liu
Proceedings of the 41st International Conference on Machine Learning (2024), pp. 1655-1671
MemoDyn: Exploiting Weakly Consistent Data Structures for Dynamic Parallel Memoization
Stephen R. Beard
Ayal Zaks
David I. August
27th IEEE International Conference on Parallel Architecture and Compilation Techniques (PACT) (2018)
Speculatively Exploiting Cross-Invocation Parallelism
Jialu Huang
Thomas B. Jablin
Soumyadeep Ghosh
Jae W. Lee
David I. August
25th IEEE International Conference on Parallel Architecture and Compilation Techniques (PACT) (2016)