Jump to Content

Private Intersection-Sum Protocols with Applications to Attributing Aggregate Ad Conversions

Mihaela Ion
Benjamin Kreuter
Erhan Nergiz
Sarvar Patel
Shobhit Saxena
David Shanahan
2020 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 370-389

Abstract

In this work, we discuss our successful efforts for industry deployment of a cryptographic secure computation protocol. The problem we consider is privately computing aggregate conversion rate of advertising campaigns. This underlying functionality can be abstracted as Private Intersection-Sum (PI-Sum) with Cardinality. In this setting two parties hold datasets containing user identifiers, and one of the parties additionally has an integer value associated with each of its user identifiers. The parties want to learn the number of identifiers they have in common and the sum of the integer values associated with these users without revealing any more information about their private inputs. We identify the major properties and enabling factors which make the deployment of a cryptographic protocol possible, practical, and uniquely positioned as a solution for the task at hand. We describe our deployment setting and the most relevant efficiency measure, which in our setting is communication overhead rather than computation. We also present a monetary cost model that can be used as a unifying cost measure and the computation model which reflect out use-case: a low-priority batch computing. We present three PI-Sum with cardinality protocols: our currently deployed protocol, which relies on a Diffie-Hellman style double masking, and two new protocols which leverage more recent techniques for private set intersection (PSI) that use Random Oblivious Transfer and encrypted Bloom filters. We compare the later two protocol with our original solution when instantiated with different additively homomorphic encryption schemes. We implement our constructions and compare their costs. We also compare with recent generic approaches for computing on the intersection of two datasets and show that our best protocol has monetary cost that is 20× less than the best known generic approach.