Gabriel Marin
Gabriel Marin has been at Google since 2014, working on profiling tools, performance analysis and optimization, networking benchmarks and audit logging tools. He received a Ph.D. in Computer Science from Rice University and a B.S. in Computer Science from the Politehnica University of Bucharest. He served as a postdoctoral researcher at Rice University and as a computer scientist at the Oak Ridge National Laboratory and the University of Tennessee.
Research Areas
Authored Publications
Sort By
Break Dancing: low overhead, architecture agnostic software branch tracing
22nd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES ’21) (2021)
Preview abstract
Sampling-based Feedback Directed Optimization (FDO) methods like AutoFDO and BOLT that employ profiles continuously collected in live production environments, are commonly used in datacenter applications to attain significant performance benefits without the toil of maintaining representative load tests. Sampled profiles rely on hardware facilities like Intel’s Last Branch Record (LBR) which are not currently available even on popular CPUs from ARM or AMD. Since not all architectures include a hardware LBR feature, we present an architecture agnostic approach to collect LBR-like data. We use sampling and limited program tracing to capture LBR like data from optimized and unmodified applications binaries. Since the implementation is in user space, we can collect arbitrarily long LBR buffers, and by varying the sampling rate, we can adjust the runtime overhead to arbitrarily low values. We target runtime overheads of <2% when the profiler is on and zero when it’s off. This amortizes to negligible fleet-wide collection cost given the size of a modern production fleet.
We implemented a profiler that uses this method of software branch tracing. We also analyzed its overhead and the similarity of the data it collects to the Intel LBR hardware using the SPEC2006 benchmarks. Results demonstrate profile quality and optimization efficacy at parity with LBR-based AutoFDO and the target profiling overhead being achievable even without implementing any advanced tuning.
View details