Sotiris Apostolakis
Sotiris is working on performance analysis and compiler optimizations of warehouse-scale systems. Before joining Google, he received his Ph.D. in Computer Science from Princeton University. His dissertation work focused on compilers, program analysis, and automatic parallelization. Earlier, he received his Diploma in Electrical and Computer Engineering from the National Technical University of Athens in Greece.
Research Areas
Authored Publications
Sort By
PROMPT: A Fast and Extensible Memory Profiling Framework
Ziyang Xu
Yebin Chon
Yian Su
Zujun Tan
Simone Campanoni
David I. August
Proceedings of the ACM on Programming Languages, 8, Issue OOPSLA (2024)
Preview abstract
Memory profiling captures programs' dynamic memory behavior, assisting programmers in debugging, tuning, and enabling advanced compiler optimizations like speculation-based automatic parallelization. As each use case demands its unique program trace summary, various memory profiler types have been developed. Yet, designing practical memory profilers often requires extensive compiler expertise, adeptness in program optimization, and significant implementation effort. This often results in a void where aspirations for fast and robust profilers remain unfulfilled. To bridge this gap, this paper presents PROMPT, a framework for streamlined development of fast memory profilers. With PROMPT, developers need only specify profiling events and define the core profiling logic, bypassing the complexities of custom instrumentation and intricate memory profiling components and optimizations. Two state-of-the-art memory profilers were ported with PROMPT where all features preserved. By focusing on the core profiling logic, the code was reduced by more than 65% and the profiling overhead was improved by 5.3× and 7.1× respectively. To further underscore PROMPT's impact, a tailored memory profiling workflow was constructed for a sophisticated compiler optimization client. In 570 lines of code, this redesigned workflow satisfies the client’s memory profiling needs while achieving more than 90% reduction in profiling overhead and improved robustness compared to the original profilers.
View details
Safer at Any Speed: Automatic Context-Aware Safety Enhancement for Rust
Natalie Popescu
Ziyang Xu
David I. August
Amit Levy
Proceedings of the ACM on Programming Languages, 5, Issue OOPSLA (2021)
Preview abstract
Type-safe languages improve application safety by eliminating whole classes of vulnerabilities–such as buffer overflows–by construction. However, this safety sometimes comes with a performance cost. As a result, many modern type-safe languages provide escape hatches that allow developers to manually bypass them. The relative value of performance to safety and the degree of performance obtained depends upon the application context, including user goals and the hardware upon which the application is to be executed. Since libraries may be used in many different contexts, library developers cannot make performance-safety trade-off decisions appropriate for all cases. Application developers can tune libraries themselves to increase safety or performance, but this requires extra effort and makes libraries less reusable. To address this problem, we present NADER, a Rust development tool that makes applications safer by automatically transforming unsafe code into equivalent safe code according to developer preferences and application context. In an end-to-end system evaluation in a given context, NADER automatically reintroduces numerous library bounds checks, in many cases making application code that uses popular Rust libraries safer with no corresponding loss in performance.
View details
MemoDyn: Exploiting Weakly Consistent Data Structures for Dynamic Parallel Memoization
Stephen R. Beard
Ayal Zaks
David I. August
27th IEEE International Conference on Parallel Architecture and Compilation Techniques (PACT) (2018)
Preview abstract
Several classes of algorithms for combinatorial search and optimization problems employ memoization data structures to speed up their serial convergence. However, accesses to these data structures impose dependences that obstruct program parallelization. Such programs often continue to function correctly even when queries into these data structures return a partial view of their contents. Weakening the consistency of these data structures can unleash new parallelism opportunities, potentially at the cost of additional computation. These opportunities must, therefore, be carefully exploited for overall speedup. This paper presents MEMODYN, a framework for parallelizing loops that access data structures with weakly consistent semantics. MEMODYN provides programming abstractions to express weak semantics, and consists of a parallelizing compiler and a runtime system that automatically and adaptively exploit the semantics for optimized parallel execution. Evaluation of MEMODYN shows that it achieves efficient parallelization, providing significant improvements over competing techniques in terms of both runtime performance and solution quality.
View details
Speculatively Exploiting Cross-Invocation Parallelism
Jialu Huang
Thomas B. Jablin
Soumyadeep Ghosh
Jae W. Lee
David I. August
25th IEEE International Conference on Parallel Architecture and Compilation Techniques (PACT) (2016)
Preview abstract
Automatic parallelization has shown promise in producing
scalable multi-threaded programs for multi-core architectures.
Most existing automatic techniques parallelize independent
loops and insert global synchronization between
loop invocations. For programs with many loop invocations,
frequent synchronization often becomes the performance
bottleneck. Some techniques exploit cross-invocation
parallelism to overcome this problem. Using static analysis,
they partition iterations among threads to avoid crossthread
dependences. However, this approach may fail if
dependence pattern information is not available at compile
time. To address this limitation, this work proposes
SpecCross–the first automatic parallelization technique to
exploit cross-invocation parallelism using speculation. With
speculation, iterations from different loop invocations can
execute concurrently, and the program synchronizes only on
misspeculation. This allows SpecCross to adapt to dependence
patterns that only manifest on particular inputs at
runtime. Evaluation on eight programs shows that SpecCross
achieves a geomean speedup of 3.43× over parallel
execution without cross-invocation parallelization.
View details