While profile guided optimizations (PGO) and link time optimiza-tions (LTO) have been widely adopted, post link optimizations (PLO)have languished until recently when researchers demonstrated that late injection of profiles can yield significant performance improvements. However, the disassembly-driven, monolithic design of post link optimizers face scaling challenges with large binaries andis at odds with distributed build systems. To reconcile and enable post link optimizations within a distributed build environment, we propose Propeller, a relinking optimizer for warehouse scale work-loads. To enable flexible code layout optimizations, we introduce basic block sections, a novel linker abstraction. Propeller uses basic block sections to enable a new approach to PLO without disassembly. Propeller achieves scalability by relinking the binary using precise profiles instead of rewriting the binary. The overhead of relinking is lowered by caching and leveraging distributed compiler actions during code generation. Propeller has been deployed to production at Google with over tens of millions of cores executing Propeller optimized code at any time. An evaluation of internal warehouse-scale applications show Propeller improves performance by 1.1% to 8% beyond PGO and ThinLTO. Compiler tools such as Clang improve by 7% while MySQL improves by 1%. Compared to the state of the art binary optimizer, Propeller achieves comparable performance while lowering memory overheads by 30%-70% on large benchmarks.View details
Devirtualization is a compiler optimization that replaces indirect (virtual) function calls with direct calls. It is particularly effective in object-oriented languages, such as Java or C++, in which virtual methods are typically abundant.
We present a novel abstract model to express the lifetimes of C++ dynamic objects and invariance of virtual table pointers in the LLVM intermediate representation. The model and the corresponding implementation in Clang and LLVM enable full devirtualization of virtual calls whenever the dynamic type is statically known and elimination of redundant virtual table loads in other cases.
Due to the complexity of C++, this has not been achieved by any other C++ compiler so far. Although our model was designed for C++, it is also applicable to other languages that use virtual dispatch. Our benchmarks show an average of 0.8% performance improvement on real-world C++ programs, with more than 30% speedup in some cases. The implementation is already a part of the upstream LLVM/Clang and can be enabled with the -fstrict-vtable-pointers flag.View details
No Results Found
We're always looking for more talented, passionate people.