Truly Perfect Samplers for Data Streams and Sliding Windows

David P. Woodruff

Rajesh Jayaram

Samson Zhou

Principles of Database Systems (PODS) 2022 (2022)

Download Google Scholar

Abstract

In the G-sampling problem, the goal is to output an index i of a vector f∈ℝn, such that for all coordinates j∈[n], Pr[i=j]=(1±ϵ)G(fj)∑k∈[n]G(fk)+γ, where G:ℝ→ℝ≥0 is some non-negative function. If ϵ=0 and γ=1/poly(n), the sampler is called perfect. In the data stream model, f is defined implicitly by a sequence of updates to its coordinates, and the goal is to design such a sampler in small space. Jayaram and Woodruff (FOCS 2018) gave the first perfect Lp samplers in turnstile streams, where G(x)=|x|p, using polylog(n) space for p∈(0,2]. However, to date all known sampling algorithms are not truly perfect, since their output distribution is only point-wise γ=1/poly(n) close to the true distribution. This small error can be significant when samplers are run many times on successive portions of a stream, and leak potentially sensitive information about the data stream. In this work, we initiate the study of truly perfect samplers, with ϵ=γ=0, and comprehensively investigate their complexity in the data stream and sliding window models.

Research Areas

Algorithms and Theory

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Truly Perfect Samplers for Data Streams and Sliding Windows

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Truly Perfect Samplers for Data Streams and Sliding Windows

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities