Frank Dabek
Research Areas
Authored Publications
Google Publications
Other Publications
Sort By
Large-scale Incremental Processing Using Distributed Transactions and Notifications
Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, USENIX (2010)
Preview abstract
Updating an index of the web as documents are
crawled requires continuously transforming a large
repository of existing documents as new documents arrive. This task is one example of a class of data processing tasks that transform a large repository of data
via small, independent mutations. These tasks lie in a
gap between the capabilities of existing infrastructure.
Databases do not meet the storage or throughput requirements of these tasks: Google's indexing system stores
tens of petabytes of data and processes billions of updates per day on thousands of machines. MapReduce and
other batch-processing systems cannot process small updates individually as they rely on creating large batches
for efficiency.
We have built Percolator, a system for incrementally
processing updates to a large data set, and deployed it
to create the Google web search index. By replacing a
batch-based indexing system with an indexing system
based on incremental processing using Percolator, we
process the same number of documents per day, while
reducing the average age of documents in Google search
results by 50%.
View details
Efficient Replica Maintenance for Distributed Storage Systems
Byung-Gon Chun
Andreas Haeberlen
Emil Sit
Hakim Weatherspoon
M. Frans Kaashoek
John Kugiatowicz
Robert Morris
2006 Symposium on Networked Systems Design and Implementation (NSDI-06), ACM, San Jose, CA
Proactive replication for data durability
Emil Sit
Andreas Haeberlen
Byung-Gon Chun
Hakim Weatherspoon
Robert Morris
M. Frans Kaashoek
John Kubiatowicz
5th International Workshop on Peer-to-Peer Systems (IPTPS 2006), Santa Barbara, CA
UsenetDHT: A Low Overhead Usenet Server
Practical, distributed network coordinates
M. Frans Kaashoek
Jinyang Li
Robert Morris
Computer Communication Review, vol. 34 (2004), pp. 113-118
Designing a DHT for Low Latency and High Throughput
Jinyang Li
Emil Sit
James Robertson
M. Frans Kaashoek
Robert Morris
NSDI (2004), pp. 85-98
Vivaldi: a decentralized network coordinate system
Towards a Common API for Structured Peer-to-Peer Overlays
Chord: a scalable peer-to-peer lookup protocol for internet applications
Ion Stoica
Robert Morris
David Liben-Nowell
David R. Karger
M. Frans Kaashoek
Hari Balakrishnan
IEEE/ACM Trans. Netw., vol. 11 (2003), pp. 17-32
Bankable Postage for Network Services
Bankable Postage for Network Services
Multiprocessor Support for Event-Driven Programs
Nickolai Zeldovich
Alexander Yip
Robert Morris
David Mazières
M. Frans Kaashoek
USENIX Annual Technical Conference, General Track (2003), pp. 239-252
Building peer-to-peer systems with Chord, a distributed lookup service
Emma Brunskill
M. Frans Kaashoek
David R. Karger
Robert Morris
Ion Stoica
Hari Balakrishnan
HotOS (2001), pp. 81-86
Wide-Area Cooperative Storage with CFS