Alexey Alexandrov
Alexey Alexandrov has been working on performance and optimization since 2004. He received PhD in Computer Science in Saratov State Technical University in Russia in 2002. He joined Google in 2013 where he is working on fleet-wide performance efficiency tools since 2016. Before Google he spent ~10 years at Intel leading the VTune performance analyzer team.
Research Areas
Authored Publications
Sort By
Break Dancing: low overhead, architecture agnostic software branch tracing
22nd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES ’21) (2021)
Preview abstract
Sampling-based Feedback Directed Optimization (FDO) methods like AutoFDO and BOLT that employ profiles continuously collected in live production environments, are commonly used in datacenter applications to attain significant performance benefits without the toil of maintaining representative load tests. Sampled profiles rely on hardware facilities like Intel’s Last Branch Record (LBR) which are not currently available even on popular CPUs from ARM or AMD. Since not all architectures include a hardware LBR feature, we present an architecture agnostic approach to collect LBR-like data. We use sampling and limited program tracing to capture LBR like data from optimized and unmodified applications binaries. Since the implementation is in user space, we can collect arbitrarily long LBR buffers, and by varying the sampling rate, we can adjust the runtime overhead to arbitrarily low values. We target runtime overheads of <2% when the profiler is on and zero when it’s off. This amortizes to negligible fleet-wide collection cost given the size of a modern production fleet.
We implemented a profiler that uses this method of software branch tracing. We also analyzed its overhead and the similarity of the data it collects to the Intel LBR hardware using the SPEC2006 benchmarks. Results demonstrate profile quality and optimization efficacy at parity with LBR-based AutoFDO and the target profiling overhead being achievable even without implementing any advanced tuning.
View details