Prakash Prabhu

Prakash Prabhu

I am a Sofware Engineer at Google with experience working in various aspects of search & advertising, ranging from mobile app query and app understanding, search over private corpora in the Cloud, and video ads feature engineering. My interests include information retrieval, distributed systems, parallel computing, program analysis & compiler optimizations. I received a PhD in Computer Science from Princeton University.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Practical Performance Guarantees for Pipelined DNN Inference
    Kuikui Liu
    Proceedings of the 41st International Conference on Machine Learning (2024)
    Preview abstract This work optimizes pipeline parallelism of machine learning model inference by partitioning computation graphs into $k$ stages and minimizing the running time of the bottleneck stage. We design practical algorithms for this NP-complete problem and prove they are nearly optimal in practice by comparing against lower bounds obtained from solving novel mixed-integer programming (MIP) formulations. We apply these algorithms and lower-bound techniques to production models to achieve substantial improvements in the approximation guarantees, compared to simple combinatorial lower bounds. For example, our new MIP formulations improve the lower bounds enough to drop the geometric mean approximation ratio from $2.175$ to $1.082$ across production data with $k=16$ pipeline stages. This work shows that while bottleneck partitioning is theoretically hard, in practice we have a handle on the algorithmic side of the problem and much of the remaining challenge is in developing more accurate cost models to give to the partitioning algorithms. View details
    MemoDyn: Exploiting Weakly Consistent Data Structures for Dynamic Parallel Memoization
    Stephen R. Beard
    Ayal Zaks
    David I. August
    27th IEEE International Conference on Parallel Architecture and Compilation Techniques (PACT) (2018)
    Preview abstract Several classes of algorithms for combinatorial search and optimization problems employ memoization data structures to speed up their serial convergence. However, accesses to these data structures impose dependences that obstruct program parallelization. Such programs often continue to function correctly even when queries into these data structures return a partial view of their contents. Weakening the consistency of these data structures can unleash new parallelism opportunities, potentially at the cost of additional computation. These opportunities must, therefore, be carefully exploited for overall speedup. This paper presents MEMODYN, a framework for parallelizing loops that access data structures with weakly consistent semantics. MEMODYN provides programming abstractions to express weak semantics, and consists of a parallelizing compiler and a runtime system that automatically and adaptively exploit the semantics for optimized parallel execution. Evaluation of MEMODYN shows that it achieves efficient parallelization, providing significant improvements over competing techniques in terms of both runtime performance and solution quality. View details
    Speculatively Exploiting Cross-Invocation Parallelism
    Jialu Huang
    Thomas B. Jablin
    Soumyadeep Ghosh
    Jae W. Lee
    David I. August
    25th IEEE International Conference on Parallel Architecture and Compilation Techniques (PACT) (2016)
    Preview abstract Automatic parallelization has shown promise in producing scalable multi-threaded programs for multi-core architectures. Most existing automatic techniques parallelize independent loops and insert global synchronization between loop invocations. For programs with many loop invocations, frequent synchronization often becomes the performance bottleneck. Some techniques exploit cross-invocation parallelism to overcome this problem. Using static analysis, they partition iterations among threads to avoid crossthread dependences. However, this approach may fail if dependence pattern information is not available at compile time. To address this limitation, this work proposes SpecCross–the first automatic parallelization technique to exploit cross-invocation parallelism using speculation. With speculation, iterations from different loop invocations can execute concurrently, and the program synchronizes only on misspeculation. This allows SpecCross to adapt to dependence patterns that only manifest on particular inputs at runtime. Evaluation on eight programs shows that SpecCross achieves a geomean speedup of 3.43× over parallel execution without cross-invocation parallelization. View details