Derek Bruening

Derek Bruening

Derek Bruening is the primary author of the DynamoRIO tool platform. Derek is a Software Engineer at Google where he works on the Dr. Memory memory debugging tool, which is built on top of DynamoRIO. Previously he built DynamoRIO-based tools at VMware and co-founded Determina, whose Memory Firewall security technology was based on DynamoRIO. Derek holds a PhD and MEng from MIT.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Optimizing Binary Translation for Dynamically Generated Code
    Byron Hawkins
    Brian Demsky
    Qin Zhao
    Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, IEEE Computer Society, Washington, DC, USA (2015), pp. 68-78
    Preview abstract Dynamic binary translation serves as a core technology that enables a wide range of important tools such as profiling, bug detection, program analysis, and security. Many of the target applications often include large amounts of dynamically generated code, which poses a special performance challenge in maintaining consistency between the source application and the translated application. This paper introduces two approaches for optimizing binary translation of JITs and other dynamic code generators. First we present a system of efficient source code annotations that allow developers to demarcate dynamic code regions and identify code changes within those regions. The second technique avoids the annotation and source code requirements by automatically inferring the presence of a JIT and instrumenting its write instructions with translation consistency operations. We implemented these techniques in DynamoRIO and demonstrate performance improvements over the state-of-the-art DBT systems on JIT applications as high as 7.3x over base DynamoRIO and Pin. View details
    Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications
    Hyoun Kyu Cho
    Rick Hank
    Scott A. Mahlke
    Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), IEEE Computer Society, Washington, DC, USA
    Preview
    Preview abstract Memory access bugs, including buffer overflows and uses of freed heap memory, remain a serious problem for programming languages like C and C++. Many memory error detectors exist, but most of them are either slow or detect a limited set of bugs, or both. This paper presents AddressSanitizer, a new memory error detector. Our tool finds out-of-bounds accesses to heap, stack, and global objects, as well as use-after-free bugs. It employs a specialized memory allocator and code instrumentation that is simple enough to be implemented in any compiler, binary translation system, or even in hardware. AddressSanitizer achieves efficiency without sacrificing comprehensiveness. Its average slowdown is just 73% yet it accurately detects bugs at the point of occurrence. It has found over 300 previously unknown bugs in the Chromium browser and many bugs in other software. View details
    Transparent dynamic instrumentation
    Qin Zhao
    Saman Amarasinghe
    Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments, ACM, New York, NY, USA (2012), pp. 133-144
    Preview
    Practical Memory Checking with Dr. Memory
    Qin Zhao
    Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, IEEE Computer Society, Los Alamitos, CA, USA (2011), pp. 213-223
    Preview abstract Memory corruption, reading uninitialized memory, using freed memory, and other memory-related errors are among the most difficult programming bugs to identify and fix due to the delay and non-determinism linking the error to an observable symptom. Dedicated memory checking tools are invaluable for finding these errors. However, such tools are difficult to build, and because they must monitor all memory accesses by the application, they incur significant overhead. Accuracy is another challenge: memory errors are not always straightforward to identify, and numerous false positive error reports can make a tool unusable. A third obstacle to creating such a tool is that it depends on low-level operating system and architectural details, making it difficult to port to other platforms and difficult to target proprietary systems like Windows. This paper presents Dr. Memory, a memory checking tool that operates on both Windows and Linux applications. Dr. Memory handles the complex and not fully documented Windows environment, and avoids reporting false positive memory leaks that plague traditional leak locating algorithms. Dr. Memory employs efficient instrumentation techniques; a direct comparison with the state-of-the-art Valgrind Memcheck tool reveals that Dr. Memory is twice as fast as Memcheck on average and up to four times faster on individual benchmarks. View details
    Dynamic cache contention detection in multi-threaded applications
    Qin Zhao
    David Koh
    Syed Raza
    Weng-Fai Wong
    VEE 2011; Proceedings of the 7th ACM SIGPLAN/SIGOPS International conference on virtual execution environments, ACM, New York, NY, pp. 27-37
    Preview abstract In today's multi-core systems, cache contention due to true and false sharing can cause unexpected and significant performance degradation. A detailed understanding of a given multi-threaded application's behavior is required to precisely identify such performance bottlenecks. Traditionally, however, such diagnostic information can only be obtained after lengthy simulation of the memory hierarchy. In this paper, we present a novel approach that efficiently analyzes interactions between threads to determine thread correlation and detect true and false sharing. It is based on the following key insight: although the slowdown caused by cache contention depends on factors including the thread-to-core binding and parameters of the memory hierarchy, the amount of data sharing is primarily a function of the cache line size and application behavior. Using memory shadowing and dynamic instrumentation, we implemented a tool that obtains detailed sharing information between threads without simulating the full complexity of the memory hierarchy. The runtime overhead of our approach --- a 5x slowdown on average relative to native execution --- is significantly less than that of detailed cache simulation. The information collected allows programmers to identify the degree of cache contention in an application, the correlation among its threads, and the sources of significant false sharing. Using our approach, we were able to improve the performance of some applications up to a factor of 12x. For other contention-intensive applications, we were able to shed light on the obstacles that prevent their performance from scaling to many cores. View details