Adaptive Data Analysis in a Balanced Adversarial Model
Abstract
In adaptive data analysis, a mechanism gets n i.i.d. samples from an unknown distribution D, and is required to provide accurate estimations to a sequence of adaptively chosen statistical queries with respect to D. Hardt and Ullman [2014] and Steinke and Ullman [2015] showed that, in general, it is computationally hard to answer more than n^2 adaptive queries, assuming the existence of one-way functions.
However, these negative results strongly rely on an adversarial model that significantly advantages the adversarial analyst over the mechanism, as the analyst, who chooses the adaptive queries, also chooses the underlying distribution D. This imbalance raises questions with respect to the applicability of the obtained hardness results -- an analyst who has complete knowledge of the underlying distribution D would have little need, if at all, to issue statistical queries to a mechanism which only holds a finite number of samples from D.
We consider more restricted adversaries, called balanced, where each such adversary consists of two separated algorithms: The sampler who is the entity that chooses the distribution and provides the samples to the mechanism, and the analyst who chooses the adaptive queries, but does not have a prior knowledge of the underlying distribution. We improve the quality of previous lower bounds by revisiting them using an efficient balanced adversary, under standard public-key cryptography assumptions. We show that these stronger hardness assumptions are unavoidable in the sense that any computationally bounded balanced adversary that has the structure of all known attacks, implies the existence of public-key cryptography.
However, these negative results strongly rely on an adversarial model that significantly advantages the adversarial analyst over the mechanism, as the analyst, who chooses the adaptive queries, also chooses the underlying distribution D. This imbalance raises questions with respect to the applicability of the obtained hardness results -- an analyst who has complete knowledge of the underlying distribution D would have little need, if at all, to issue statistical queries to a mechanism which only holds a finite number of samples from D.
We consider more restricted adversaries, called balanced, where each such adversary consists of two separated algorithms: The sampler who is the entity that chooses the distribution and provides the samples to the mechanism, and the analyst who chooses the adaptive queries, but does not have a prior knowledge of the underlying distribution. We improve the quality of previous lower bounds by revisiting them using an efficient balanced adversary, under standard public-key cryptography assumptions. We show that these stronger hardness assumptions are unavoidable in the sense that any computationally bounded balanced adversary that has the structure of all known attacks, implies the existence of public-key cryptography.