Derek Bruening
Derek Bruening is the primary author of the DynamoRIO tool platform. Derek is a Software Engineer at Google where he works on the Dr. Memory memory debugging tool, which is built on top of DynamoRIO. Previously he built DynamoRIO-based tools at VMware and co-founded Determina, whose Memory Firewall security technology was based on DynamoRIO. Derek holds a PhD and MEng from MIT.
Research Areas
Authored Publications
Sort By
Optimizing Binary Translation for Dynamically Generated Code
Byron Hawkins
Brian Demsky
Qin Zhao
Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, IEEE Computer Society, Washington, DC, USA (2015), pp. 68-78
Preview abstract
Dynamic binary translation serves as a core technology that enables a wide range of important tools such as profiling, bug detection, program analysis, and security. Many of the target applications often include large amounts of dynamically generated code, which poses a special performance challenge in maintaining consistency between the source application and the translated application. This paper
introduces two approaches for optimizing binary translation of JITs and other dynamic code generators. First we present a system of efficient source code annotations that allow developers to demarcate dynamic code regions and identify code changes within those regions. The second technique avoids the annotation and source code requirements by automatically inferring the presence of a JIT and instrumenting its write instructions with translation consistency operations. We implemented these techniques in DynamoRIO and demonstrate performance improvements over the state-of-the-art DBT systems on JIT applications as high as 7.3x over base DynamoRIO and Pin.
View details
Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications
Preview
Hyoun Kyu Cho
Rick Hank
Scott A. Mahlke
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), IEEE Computer Society, Washington, DC, USA
Transparent dynamic instrumentation
Preview
Qin Zhao
Saman Amarasinghe
Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments, ACM, New York, NY, USA (2012), pp. 133-144
Preview abstract
Memory access bugs, including buffer overflows and
uses of freed heap memory, remain a serious problem for
programming languages like C and C++. Many memory
error detectors exist, but most of them are either slow or
detect a limited set of bugs, or both.
This paper presents AddressSanitizer, a new memory
error detector. Our tool finds out-of-bounds accesses to
heap, stack, and global objects, as well as use-after-free
bugs. It employs a specialized memory allocator and
code instrumentation that is simple enough to be implemented in any compiler, binary translation system, or
even in hardware.
AddressSanitizer achieves efficiency without sacrificing comprehensiveness. Its average slowdown is just
73% yet it accurately detects bugs at the point of occurrence. It has found over 300 previously unknown bugs in
the Chromium browser and many bugs in other software.
View details
Practical Memory Checking with Dr. Memory
Qin Zhao
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, IEEE Computer Society, Los Alamitos, CA, USA (2011), pp. 213-223
Preview abstract
Memory corruption, reading uninitialized memory, using freed memory, and other memory-related errors are among the most difficult programming bugs to identify and fix due to the delay and non-determinism linking the error to an observable symptom. Dedicated memory checking tools are invaluable for finding these errors. However, such tools are difficult to build, and because they must monitor all memory accesses by the application, they incur significant overhead. Accuracy is another challenge: memory errors are not always straightforward to identify, and numerous false positive error reports can make a tool unusable. A third obstacle to creating such a tool is that it depends on low-level operating system and architectural details, making it difficult to port to other platforms and
difficult to target proprietary systems like Windows.
This paper presents Dr. Memory, a memory checking tool that operates on both Windows and Linux applications. Dr. Memory handles the complex and not fully documented Windows environment, and avoids reporting false positive memory leaks that plague traditional leak locating algorithms. Dr. Memory employs efficient instrumentation techniques; a direct comparison with the state-of-the-art Valgrind Memcheck tool reveals that Dr. Memory is twice as fast as Memcheck on average and up to four times faster on individual benchmarks.
View details
Dynamic cache contention detection in multi-threaded applications
Qin Zhao
David Koh
Syed Raza
Weng-Fai Wong
VEE 2011; Proceedings of the 7th ACM SIGPLAN/SIGOPS International conference on virtual execution environments, ACM, New York, NY, pp. 27-37
Preview abstract
In today's multi-core systems, cache contention due to true and false sharing can cause unexpected and significant performance degradation. A detailed understanding of a given multi-threaded application's behavior is required to precisely identify such performance bottlenecks. Traditionally, however, such diagnostic information can only be obtained after lengthy simulation of the memory hierarchy.
In this paper, we present a novel approach that efficiently analyzes interactions between threads to determine thread correlation and detect true and false sharing. It is based on the following key insight: although the slowdown caused by cache contention depends on factors including the thread-to-core binding and parameters of the memory hierarchy, the amount of data sharing is primarily a function of the cache line size and application behavior. Using memory shadowing and dynamic instrumentation, we implemented a tool that obtains detailed sharing information between threads without simulating the full complexity of the memory hierarchy. The runtime overhead of our approach --- a 5x slowdown on average relative to native execution --- is significantly less than that of detailed cache simulation. The information collected allows programmers to identify the degree of cache contention in an application, the correlation among its threads, and the sources of significant false sharing. Using our approach, we were able to improve the performance of some applications up to a factor of 12x. For other contention-intensive applications, we were able to shed light on the obstacles that prevent their performance from scaling to many cores.
View details