Graph-based Discriminators: Sample Complexity and Expressiveness
Abstract
A basic question in learning theory is to identify if two
distributions are identical when we have access only to examples sampled from the distributions.
%
This basic task is considered, for example, in the context of
Generative Adversarial Networks (GANs), where a discriminator is trained to distinguish between a real-life distribution and a synthetic distribution.
%
Classically, we use a hypothesis class $H$ and claim that the two
distributions are distinct if for some $h\in H$ the expected value
on the two distributions is (significantly) different.
%This is the dominant approach used in training distribution discriminators.
Our starting point is the following fundamental problem: "is having
the hypothesis dependent on more than a single random example
beneficial". To address this challenge we define $k$-ary based
discriminators, which have a family of Boolean $k$-ary functions
$\G$. Each function $g\in \G$ naturally defines a hyper-graph,
indicating whether a given hyper-edge exists. A function $g\in \G$
distinguishes between two distributions, if the expected value of
$g$, on a $k$-tuple of i.i.d examples, on the two distributions is
(significantly) different.
We study the expressiveness of families of $k$-ary functions,
compared to the classical hypothesis class $H$, which is $k=1$. We
show a separation in expressiveness of $k+1$-ary versus $k$-ary
functions. This demonstrate the great benefit of having $k\geq 2$ as
distinguishers.
For $k\geq 2$ we introduce a notion similar to the VC-dimension, and
show that it controls the sample complexity. We proceed and provide upper and
lower bounds as a function of our extended notion of VC-dimension.