Google Research

Submodular Streaming in All Its Glory: Tight Approximation, Minimum Memory and Low Adaptive Complexity

ICML (2019)


Any streaming algorithm is commonly judged by the quality of its solution, the memory footprint, and its required amount of computations. In this paper, we study the problem of maximizing a monotone submodular function subject to a cardinality constraint k. We propose SIEVESTREAMING++, a streaming algorithm that with one pass over the data, while keeping only O(k) elements, achieves the tight 1/2 approximation guarantee. The best previously known streaming algorithms either achieve a suboptimal approximation ratio 1/4 with Θ(k) memory or the tight approximation ratio 1/2 with O(k log k) memory. We then show that by buffering a small fraction of the data stream, and through a careful filtering procedure, one can heavily reduce the rounds of adaptive computations, thus can lower the computational complexity of SIEVE-STREAMING++ substantially. We then generalize our results to the more challenging multi-source streaming setting. We show how one can achieve the tight 1/2 approximation guarantee with a O(k) shared memory, while minimizing not only the rounds of computations but also the total number of communicated bits. Finally, we demonstrate the efficiency of our algorithms on real-world applications including data summarization for multisource streams of tweets and of YouTube videos

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work