Differentially Private Model Personalization

Abhradeep Guha Thakurta
Adam Smith
(2021)
Google Scholar

Abstract

We study personalization of supervised learning with user-level differential privacy. Consider a setting with many users, each of whom has a training data set drawn from their own distribution $P_i$. Assuming some shared structure among the problems $P_i$, can users collectively learn the shared structure---and solve their tasks better than they could individually---while preserving the privacy of their data? We formulate this question using joint, \textit{user-level} differential privacy---that is, we control what is leaked about each user's entire data set.

We provide algorithms that exploit popular non-private approaches in this domain like the Almost-No-Inner-Loop (ANIL) method, and give strong user-level privacy guarantees for our general approach. When the problems $P_i$ are linear regression problems with each user's regression vector lying in a common, unknown low-dimensional subspace, we show that our efficient algorithms satisfy nearly optimal estimation error guarantees. We also establish a general, information-theoretic upper bound via an exponential mechanism-based algorithm.

Finally, we demonstrate empirically (through experiments on synthetic data sets) that our framework not only performs well in the studied linear regression setting, but also extends to other settings like logistic regression that are not captured by our estimation error analysis.