Deniz Altınbüken

Deniz Altınbüken

I’m a Research Engineer at Google DeepMind. My work is focused on advancing the state-of-the-art in systems using AI and ML. Prior to joining Google I received a Ph.D. in Computer Science from Cornell University. During my Ph.D. I focused on building distributed systems that rely on strong foundations in distributed systems Theory. My primary research interests are in machine learning for systems, distributed systems, database systems, and operating systems.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Snowcat: Efficient Kernel Concurrency Testing using a Learned Coverage Predictor
    Sishuai Gong
    Dinglan Peng
    Pedro Fonseca
    Symposium on Operating Systems Principles (SOSP) (2023)
    Preview abstract Random-based approaches and heuristics are commonly used in kernel concurrency testing due to the massive scale of modern kernels and corresponding interleaving space. The lack of accurate and scalable approaches to analyze concurrent kernel executions makes existing testing approaches heavily rely on expensive dynamic executions to measure the effectiveness of a new test. Unfortunately, the high cost incurred by dynamic executions limits the breadth of the exploration and puts latency pressure on finding effective concurrent test inputs and schedules, hindering the overall testing effectiveness. This paper proposes Snowcat, a kernel concurrency testing framework that generates effective test inputs and schedules using a learned kernel block-coverage predictor. Using a graph neural network, the coverage predictor takes a concurrent test input and scheduling hints and outputs a prediction on whether certain important code blocks will be executed. Using this predictor, Snowcat can skip concurrent tests that are likely to be fruitless and prioritize the promising ones for actual dynamic execution. After testing the Linux kernel for over a week, Snowcat finds ∼17% more potential data races, by prioritizing tests of more fruitful schedules than existing work would have chosen. Snowcat can also find effective test inputs that expose new concurrency bugs with higher probability (1.4×∼2.6×), or reproduce known bugs more quickly (15×) than state-ofart testing tools. More importantly, Snowcat is shown to be more efficient at reaching a desirable level of race coverage in the continuous setting, as the Linux kernel evolves from version to version. In total, Snowcat discovered 17 new concurrency bugs in Linux kernel 6.1, of which 13 are confirmed and 6 are fixed. View details
    HALP: Heuristic Aided Learned Preference Eviction Policy for YouTube Content Delivery Network
    Zhenyu Song
    Kevin Chen
    Νikhil Sarda
    Eugene Brevdo
    Jimmy Coleman
    Xiao Ju
    Pawel Jurczyk
    Richard Schooler
    20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), USENIX Association, Boston, MA (2023), pp. 1149-1163
    Preview abstract Video streaming services are among the largest web applications in production, and a large source of downstream internet traffic. A large-scale video streaming service at Google, YouTube, leverages a Content Delivery Network (CDN) to serve its users. A key consideration in providing a seamless service is cache efficiency. In this work, we demonstrate machine learning techniques to improve the efficiency of YouTube's CDN DRAM cache. While many recently proposed learning-based caching algorithms show promising results, we identify and address three challenges blocking deployment of such techniques in a large-scale production environment: computation overhead for learning, robust byte miss ratio improvement, and measuring impact under production noise. We propose a novel caching algorithm, HALP, which achieves low CPU overhead and robust byte miss ratio improvement by augmenting a heuristic policy with machine learning. We also propose a production measurement method, impact distribution analysis, that can accurately measure the impact distribution of a new caching algorithm deployment in a noisy production environment. HALP has been running in YouTube CDN production as a DRAM level eviction algorithm since early 2022 and has reliably reduced the byte miss during peak by an average of 9.1% while expending a modest CPU overhead of 1.8%. View details
    Snowboard: Finding Kernel Concurrency Bugs through Systematic Inter-thread Communication Analysis
    Sishuai Gong
    Pedro Fonseca
    Proceedings of the 28th ACM Symposium on Operating Systems Principles (2021) (to appear)
    Preview abstract Kernel concurrency bugs are challenging to find because they depend on very specific thread interleavings and test inputs. While separately exploring kernel thread interleavings or test inputs has been closely examined, jointly exploring interleavings and test inputs has received little attention, in part due to the resulting vast search space. Using precious, limited testing resources to explore this search space and execute just the right concurrent tests in the proper order is critical. This paper proposes Snowboard a testing framework that generates and executes concurrent tests by intelligently exploring thread interleavings and test inputs jointly. The design of Snowboard is based on a concept called potential memory communication (PMC), a guess about pairs of tests that, when executed concurrently, are likely to perform memory accesses to shared addresses, which in turn may trigger concurrency bugs. To identify PMCs, Snowboard runs tests sequentially from a fixed initial kernel state, collecting their memory accesses. It then pairs up tests that write and read the same region into candidate concurrent tests. It executes those tests using the associated PMC as a scheduling hint to focus interleaving search only on those schedules that directly affect the relevant memory accesses. By clustering candidate tests on various features of their PMCs, Snowboard avoids testing similar behaviors, which would be inefficient. Finally, by executing tests from small clusters first, it prioritizes uncommon suspicious behaviors that may have received less scrutiny. Snowboard discovered 14 new concurrency bugs in Linux kernels 5.3.10 and 5.12-rc3, of which 12 have been confirmed by developers. Six of these bugs cause kernel panics and filesystem errors, and at least two have existed in the kernel for many years, showing that this approach can uncover hard-to-find, critical bugs. Furthermore, we show that covering as many distinct pairs of uncommon read/write instructions as possible is the test-prioritization strategy with the highest bug yield for a given test-time budget. View details
    Learned Indexes for a Google-scale Disk-based Database
    Hussam Abu-Libdeh
    Alex Beutel
    Lyric Pankaj Doshi
    Tim Klas Kraska
    Chris Olston
    (2020)
    Preview abstract There is great excitement about learned index structures, but understandable skepticism about the practicality of a new method uprooting decades of research on B-Trees. In this paper, we work to remove some of that uncertainty by demonstrating how a learned index can be integrated in a distributed, disk-based database system: Google’s Bigtable. We detail several design decisions we made to integrate learned indexes in Bigtable. Our results show that integrating learned index significantly improves the end-to-end read latency and throughput for Bigtable. View details