Sotiris Apostolakis

Sotiris Apostolakis

Sotiris is working on performance analysis and compiler optimizations of warehouse-scale systems. Before joining Google, he received his Ph.D. in Computer Science from Princeton University. His dissertation work focused on compilers, program analysis, and automatic parallelization. Earlier, he received his Diploma in Electrical and Computer Engineering from the National Technical University of Athens in Greece.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    PROMPT: A Fast and Extensible Memory Profiling Framework
    Ziyang Xu
    Yebin Chon
    Yian Su
    Zujun Tan
    Simone Campanoni
    David I. August
    Proceedings of the ACM on Programming Languages, 8, Issue OOPSLA(2024)
    Preview abstract Memory profiling captures programs' dynamic memory behavior, assisting programmers in debugging, tuning, and enabling advanced compiler optimizations like speculation-based automatic parallelization. As each use case demands its unique program trace summary, various memory profiler types have been developed. Yet, designing practical memory profilers often requires extensive compiler expertise, adeptness in program optimization, and significant implementation effort. This often results in a void where aspirations for fast and robust profilers remain unfulfilled. To bridge this gap, this paper presents PROMPT, a framework for streamlined development of fast memory profilers. With PROMPT, developers need only specify profiling events and define the core profiling logic, bypassing the complexities of custom instrumentation and intricate memory profiling components and optimizations. Two state-of-the-art memory profilers were ported with PROMPT where all features preserved. By focusing on the core profiling logic, the code was reduced by more than 65% and the profiling overhead was improved by 5.3× and 7.1× respectively. To further underscore PROMPT's impact, a tailored memory profiling workflow was constructed for a sophisticated compiler optimization client. In 570 lines of code, this redesigned workflow satisfies the client’s memory profiling needs while achieving more than 90% reduction in profiling overhead and improved robustness compared to the original profilers. View details
    Safer at Any Speed: Automatic Context-Aware Safety Enhancement for Rust
    Natalie Popescu
    Ziyang Xu
    David I. August
    Amit Levy
    Proceedings of the ACM on Programming Languages, 5, Issue OOPSLA(2021)
    Preview abstract Type-safe languages improve application safety by eliminating whole classes of vulnerabilities–such as buffer overflows–by construction. However, this safety sometimes comes with a performance cost. As a result, many modern type-safe languages provide escape hatches that allow developers to manually bypass them. The relative value of performance to safety and the degree of performance obtained depends upon the application context, including user goals and the hardware upon which the application is to be executed. Since libraries may be used in many different contexts, library developers cannot make performance-safety trade-off decisions appropriate for all cases. Application developers can tune libraries themselves to increase safety or performance, but this requires extra effort and makes libraries less reusable. To address this problem, we present NADER, a Rust development tool that makes applications safer by automatically transforming unsafe code into equivalent safe code according to developer preferences and application context. In an end-to-end system evaluation in a given context, NADER automatically reintroduces numerous library bounds checks, in many cases making application code that uses popular Rust libraries safer with no corresponding loss in performance. View details
    MemoDyn: Exploiting Weakly Consistent Data Structures for Dynamic Parallel Memoization
    Prakash Prabhu
    Stephen R. Beard
    Ayal Zaks
    David I. August
    27th IEEE International Conference on Parallel Architecture and Compilation Techniques (PACT)(2018)
    Preview abstract Several classes of algorithms for combinatorial search and optimization problems employ memoization data structures to speed up their serial convergence. However, accesses to these data structures impose dependences that obstruct program parallelization. Such programs often continue to function correctly even when queries into these data structures return a partial view of their contents. Weakening the consistency of these data structures can unleash new parallelism opportunities, potentially at the cost of additional computation. These opportunities must, therefore, be carefully exploited for overall speedup. This paper presents MEMODYN, a framework for parallelizing loops that access data structures with weakly consistent semantics. MEMODYN provides programming abstractions to express weak semantics, and consists of a parallelizing compiler and a runtime system that automatically and adaptively exploit the semantics for optimized parallel execution. Evaluation of MEMODYN shows that it achieves efficient parallelization, providing significant improvements over competing techniques in terms of both runtime performance and solution quality. View details
    Speculatively Exploiting Cross-Invocation Parallelism
    Jialu Huang
    Prakash Prabhu
    Thomas B. Jablin
    Soumyadeep Ghosh
    Jae W. Lee
    David I. August
    25th IEEE International Conference on Parallel Architecture and Compilation Techniques (PACT)(2016)
    Preview abstract Automatic parallelization has shown promise in producing scalable multi-threaded programs for multi-core architectures. Most existing automatic techniques parallelize independent loops and insert global synchronization between loop invocations. For programs with many loop invocations, frequent synchronization often becomes the performance bottleneck. Some techniques exploit cross-invocation parallelism to overcome this problem. Using static analysis, they partition iterations among threads to avoid crossthread dependences. However, this approach may fail if dependence pattern information is not available at compile time. To address this limitation, this work proposes SpecCross–the first automatic parallelization technique to exploit cross-invocation parallelism using speculation. With speculation, iterations from different loop invocations can execute concurrently, and the program synchronizes only on misspeculation. This allows SpecCross to adapt to dependence patterns that only manifest on particular inputs at runtime. Evaluation on eight programs shows that SpecCross achieves a geomean speedup of 3.43× over parallel execution without cross-invocation parallelization. View details
    NOELLE Offers Empowering LLVM Extensions
    Angelo Matni
    Enrico Armenio Deiana
    Yian Su
    Lukas Gross
    Souradip Ghosh
    Ziyang Xu
    Zujun Tan
    Ishita Chaturvedi
    Brian Homerding
    Tommy McMichen
    David I. August
    Simone Campanoni
    Proceedings of the 2022 International Symposium on Code Generation and Optimization (CGO)
    Preview abstract Modern and emerging architectures demand increasingly complex compiler analyses and transformations. As the emphasis on compiler infrastructure moves beyond support for peephole optimizations and the extraction of instruction-level parallelism, compilers should support custom tools designed to meet these demands with higher-level analysis-powered abstractions and functionalities of wider program scope. This paper introduces NOELLE, a robust open-source domain-independent compilation layer built upon LLVM providing this support. NOELLE extends abstractions and functionalities provided by LLVM enabling advanced, program-wide code analyses and transformations. This paper shows the power of NOELLE by presenting a diverse set of 11 custom tools built upon it. View details
    SCAF: A Speculation-Aware Collaborative Dependence Analysis Framework
    Ziyang Xu
    Zujun Tan
    Greg Chan
    Simone Campanoni
    David I. August
    Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)(2020)
    Preview abstract Program analysis determines the potential dataflow and control flow relationships among instructions so that compiler optimizations can respect these relationships to transform code correctly. Since many of these relationships rarely or never occur, speculative optimizations assert they do not exist while optimizing the code. To preserve correctness, speculative optimizations add validation checks to activate recovery code when these assertions prove untrue. This approach results in many missed opportunities because program analysis and thus other optimizations remain unaware of the full impact of these dynamically-enforced speculative assertions. To address this problem, this paper presents SCAF, a Speculation-aware Collaborative dependence Analysis Framework. SCAF learns of available speculative assertions via profiling, computes their full impact on memory dependence analysis, and makes this resulting information available for all code optimizations. SCAF is modular (adding new analysis modules is easy) and collaborative (modules cooperate to produce a result more precise than the confluence of all individual results). Relative to the best prior speculation-aware dependence analysis technique, by computing the full impact of speculation on memory dependence analysis, SCAF dramatically reduces the need for expensive-to-validate memory speculation in the hot loops of all 16 evaluated C/C++ SPEC benchmarks. View details
    Perspective: A Sensible Approach to Speculative Automatic Parallelization
    Ziyang Xu
    Greg Chan
    Simone Campanoni
    David I. August
    Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)(2020)
    Preview abstract The promise of automatic parallelization, freeing programmers from the error-prone and time-consuming process of making efficient use of parallel processing resources, remains unrealized. For decades, the imprecision of memory analysis limited the applicability of non-speculative automatic parallelization. The introduction of speculative automatic parallelization overcame these applicability limitations, but, even in the case of no misspeculation, these speculative techniques exhibit high communication and bookkeeping costs for validation and commit. This paper presents Perspective, a speculative-DOALL parallelization framework that maintains the applicability of speculative techniques while approaching the efficiency of non-speculative ones. Unlike current approaches which subsequently apply speculative techniques to overcome the imprecision of memory analysis, Perspective combines a novel speculation-aware memory analyzer, new efficient speculative privatization methods, and a planning phase to select a minimal-cost set of parallelization-enabling transforms. By reducing speculative parallelization overheads in ways not possible with prior parallelization systems, Perspective obtains higher overall program speedup (23.0x for 12 general-purpose C/C++ programs running on a 28-core shared-memory commodity machine) than Privateer (11.5x), the prior automatic DOALL parallelization system with the highest applicability. View details
    Architectural Support for Containment-based Security
    Hansen Zhang
    Soumyadeep Ghosh
    Jordan Fix
    Stephen R. Beard
    Nayana P. Nagendra
    Taewook Oh
    David I. August
    Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)(2019)
    Preview abstract Software security techniques rely on correct execution by the hardware. Securing hardware components has been challenging due to their complexity and the proportionate attack surface they present during their design, manufacture, deployment, and operation. Recognizing that external communication represents one of the greatest threats to a system's security, this paper introduces the TrustGuard containment architecture. TrustGuard contains malicious and erroneous behavior using a relatively simple and pluggable gatekeeping hardware component called the Sentry. The Sentry bridges a physical gap between the untrusted system and its external interfaces. TrustGuard allows only communication that results from the correct execution of trusted software, thereby preventing the ill effects of actions by malicious hardware or software from leaving the system. The simplicity and pluggability of the Sentry, which is implemented in less than half the lines of code of a simple in-order processor, enables additional measures to secure this root of trust, including formal verification, supervised manufacture, and supply chain diversification with less than a 15% impact on performance. View details
    Hardware MultiThreaded Transactions
    Jordan Fix
    Nayana P. Nagendra
    Hansen Zhang
    Sophie Qiu
    David I. August
    Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)(2018)
    Preview abstract Speculation with transactional memory systems helps programmers and compilers produce profitable thread-level parallel programs. Prior work shows that supporting transactions that can span multiple threads, rather than requiring transactions be contained within a single thread, enables new types of speculative parallelization techniques for both programmers and parallelizing compilers. Unfortunately, software support for multi-threaded transactions (MTXs) comes with significant additional inter-thread communication overhead for speculation validation. This overhead can make otherwise good parallelization unprofitable for programs with sizeable read and write sets. Some programs using these prior software MTXs overcame this problem through significant efforts by expert programmers to minimize these sets and optimize communication, capabilities which compiler technology has been unable to equivalently achieve. Instead, this paper makes speculative parallelization less laborious and more feasible through low-overhead speculation validation, presenting the first complete design, implementation, and evaluation of hardware MTXs. Even with maximal speculation validation of every load and store inside transactions of tens to hundreds of millions of instructions, profitable parallelization of complex programs can be achieved. Across 8 benchmarks, this system achieves a geomean speedup of 99% over sequential execution on a multicore machine with 4 cores. View details