Robert Hundt

Robert Hundt

Robert Hundt received a degree in Computer Science from Technical University in Munich in 1992. Until 1999 he worked for Terrasat GmbH in Germany, a 20+ people R&D company he co-owned. He played many roles - from company lead to booth cat - while writing and optimizing software for surveying and navigation with satellite systems.

In 2000 he started working for Hewlett-Packard Company in California on bringing up the new and scalable high-level optimizer SYZYGY for the HP C/C++/FORTRAN compilers with a new inter-procedural optimizer, a new loop optimizer, and a new scalar optimizer. Before joining the compiler group, Robert was responsible for dynamic binary instrumentation for Intel Itanium processors, co-creating and designing the performance analysis tool HP Caliper.

Since beginning of 2007 Robert has been working for Google. He created various compiler and performance projects, e.g., he served as Tech Lead for compiler optimization for servers (x86), Android (ARM), and GPUs (open-source CUDA compiler), built datacenter profiling and performance analysis tools, and worked on GMail/Apps performance, from Chrome to datacenter. For many years Robert was the SW lead for Google TPU - supercomputers to accelerate machine learning inference and training, which include the open-source TensorFlow compiler XLA. Today he is the TL for ML compilers, runtimes, and performance, for TPU, GPU, and CPU. In parallel, he works on the open-source High-Level Synthesis toolchain XLS and dabbles in Quantum Computing. He remains strongly engaged in compiler and datacenter research.

In real life, he enjoys spending time with his family, playing the piano (at which he sucks), playing Volleyball (which he used to do fairly well) and everything related to delicious high quality food (his main reason for joining Google ;-)

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Google
Quantum Computing for Programmers
Cambridge University Press, Cambridge CB2 8BS, United Kingdom (2022)
In-Datacenter Performance Analysis of a Tensor Processing Unit
Norman P. Jouppi
Nishant Patil
Gaurav Agrawal
Raminder Bajwa
Sarah Bates
Suresh Bhatia
Nan Boden
Al Borchers
Rick Boyle
Pierre-luc Cantin
Clifford Chao
Chris Clark
Jeremy Coriell
Mike Daley
Matt Dau
Ben Gelb
Tara Vazir Ghaemmaghami
Rajendra Gottipati
William Gulland
Robert Hagmann
C. Richard Ho
Doug Hogberg
John Hu
Dan Hurt
Julian Ibarz
Aaron Jaffey
Alek Jaworski
Alexander Kaplan
Harshit Khaitan
Andy Koch
Naveen Kumar
Steve Lacy
James Law
Diemthu Le
Chris Leary
Zhuyuan Liu
Kyle Lucke
Alan Lundin
Gordon MacKean
Adriana Maggiore
Maire Mahony
Kieran Miller
Rahul Nagarajan
Ravi Narayanaswami
Ray Ni
Kathy Nix
Thomas Norrie
Mark Omernick
Narayana Penukonda
Andy Phelps
Jonathan Ross
ISCA (2017) (to appear)
GPUCC - An Open-Source GPGPU Compiler
Jingyue Wu
Mark Heffernan
Chris Leary
Bjarke Roune
Rob Springer
Xuetian Weng
Proceedings of the 2016 International Symposium on Code Generation and Optimization, ACM, New York, NY, pp. 105-116
Whare-Map: Heterogeneity in “Homogeneous” Warehouse-Scale Computers
Jason Mars
Lingjia Tang
Proceedings of the 2013 ACM/IEEE International Symposium on Computer Architecture (ISCA), IEEE (to appear)
JSWhiz - Static Analysis for JavaScript Memory Leaks
Proceedings of the 10th annual IEEE/ACM international symposium on Code generation and optimization, IEEE (2013)
Optimizing Google's Warehouse Scale Computers: The NUMA Experience
Lingjia Tang
Jason Mars
Robert Hagmann
The 19th IEEE International Symposium on High Performance Computer Architecture (2013)
Preview
Heterogeneity in “Homogeneous” Warehouse-Scale Computers: A Performance Opportunity
Jason Mars
Lingjia Tang
IEEE Computer Architecture Letters (CAL), Vol. 10 No. 2 (2011), pp. 29-32
MAO - an Extensible Micro-Architectural Optimizer
Easwaran Raman
Martin Thuresson
Neil Vachharajani
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, ACM (2011)
Bubble-Up: Increasing Utilization In Modern Warehouse Scale Computers Via Sensible Co-Locations
Jason Mars
Linjia Tang
Kevin Skadron
Mary Lou Souffa
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, 2011, IEEE, New York, NY, USA
RACEZ: A Lightweight and Non-Invasive Race Detection Tool for Production Applications
Tianwei Sheng
Neil Vachharajani
Stephane Eranian
ICSE, ACM (2011), pp. 401-410