Karn Seth
I'm a Software Engineer in the Privacy and Data Protection Office (PDPO), where I work mainly on various flavors of Private Join and Compute, and am a primary maintainer of the associated open source repository. More broadly, I'm very interested in the use of Secure Multiparty Computation for privacy preserving measurement, analytics and machine learning, with a special focus on practical deployment.
In the past, I've worked with folks on Federated Machine Learning and Analytics, including the development of the Secure Aggregation protocol, which was featured in the Federated Learning Comic.
I've had the pleasure of hosting/co-hosting an incredible group of summer interns:
- 2021,2022: Stan Peceny (Georgia Tech), Amit Agarwal (University of Illinois, Urbana Champaign)
- 2020: Mahimna Kelkar (Cornell Tech), Le Phi Hung (Now at Google)
- 2019: Ni Trieu (Now at Arizona State University), Marshall Ball (Now at NYU)
- 2018: Peihan Miao (Now at Brown University)
- 2017: Jonathan Frankle (MIT)
- 2016: Antonio Marcedone (Now at Zoom)
Research Areas
Authored Publications
Sort By
Preview abstract
We introduce the first construction for secure two-party computation of Poisson regression,
which enables two parties who hold shares of the input samples to learn only the resulting
Poisson model while protecting the privacy of the inputs.
Our construction relies on new protocols for secure fixed-point exponentiation and correlated matrix multiplications. Our secure exponentiation construction avoids expensive bit
decomposition and achieves orders of magnitude improvement in both online and offline costs
over state of the art works. As a result, the dominant cost for our secure Poisson regression
are matrix multiplications with one fixed matrix. We introduce a new technique, called correlated Beaver triples, which enables many such multiplications at the cost of roughly one matrix
multiplication. This further brings down the cost of secure Poisson regression.
We implement our constructions and show their extreme efficiency. In a LAN setting, our
secure exponentiation for 20-bit fractional precision takes less than 0.07ms with a batch-size of
100,000. One iteration of secure Poisson regression on a dataset with 10, 000 samples with 1000
binary features needs about 65.82s in the offline phase, 55.14s in the online phase and 17MB
total communication. For several real datasets this translates into training that takes seconds
and only a couple of MB communication
View details
Preview abstract
The private join and compute (PJC) functionality enables secure computation over data distributed across different databases, and is applicable to a wide range of applications, many of which address settings where the input databases are of significantly different sizes.
We introduce the notion of private information retrieval (PIR) with default, which enables two-party PJC functionalities in a way that hides the size of the intersection of the two databases and incurs sublinear communication cost in the size of the bigger database. We provide two constructions for this functionality, one of which requires offline linear communication, which can be amortized across queries, and one that provides sublinear cost for each query but relies on more computationally expensive tools. We construct inner-product PJC, which has applications to ads conversion measurement and contact tracing, relying on an extension of PIR with default. We evaluate the efficiency of our constructions, which can enable 28 PIR with default lookups on a database of size 2^25 (or inner-product PJC on databases with such sizes) with the communication of 44 MB, which costs less than 0.17 c. for the client and 26.48 c. for the server.
View details
Preview abstract
This document describes a secure mechanism to join sets of heterogeneous ids from multiple data providers (e.g. broadcasters, publishers, data enrichment providers) and create a set of encrypted common identifiers, which we refer to as SUMIDs. These identifiers can be used for computing multi-party reach, frequency, sales lift, MTA, and other ads metrics. We also introduce the concept of “match rules”, which dictate when two heterogeneous IDs should be assigned the same SUMID within the secure mechanism. We avoid prescribing specific match rules as these could vary depending upon a number of considerations, such as the specific media market or whether the SUMID is intended to represent a household or an individual. Optimal choice of match rules are also an open area of research.
View details
Communication–Computation Trade-offs in PIR
Asra Ali
Tancrède Lepoint
Sarvar Patel
Phillipp Schoppmann
Kevin Yeo
30th USENIX Security Symposium (2021)
Preview abstract
We study the computation and communication costs and their possible trade-offs in various constructions for private information retrieval (PIR), including schemes based on homomorphic encryption and the Gentry–Ramzan PIR (ICALP'05).
We improve over the construction of SealPIR (S&P'18) using compression techniques and a new oblivious expansion, which reduce the communication bandwidth by 80% while preserving essentially the same computation cost. We then present MulPIR, a PIR protocol additionally leveraging multiplicative homomorphism to implement the recursion steps in PIR. While using the multiplicative homomorphism has been considered in prior work, we observe that in combination with our other techniques, it introduces a meaningful tradeoff by significantly reducing communication, at the cost of an increased computational cost for the server, when the databases have large entries. For some applications, we show that this could reduce the total monetary server cost by up to 35%.
On the other end of the communication–computation spectrum, we take a closer look at Gentry–Ramzan PIR, a scheme with asymptotically optimal communication rate. Here, the bottleneck is the server's computation, which we manage to reduce significantly. Our optimizations enable a tunable tradeoff between communication and computation, which allows us to reduce server computation by as much as 85%, at the cost of an increased query size.
Finally, we introduce new ways to handle PIR over sparse databases (keyword PIR), based on different hashing techniques. We implement all of our constructions, and compare their communication and computation overheads with respect to each other for several application scenarios.
View details
Private Intersection-Sum Protocols with Applications to Attributing Aggregate Ad Conversions
Mihaela Ion
Benjamin Kreuter
Erhan Nergiz
Sarvar Patel
Shobhit Saxena
David Shanahan
2020 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 370-389
Preview abstract
In this work, we discuss our successful efforts for
industry deployment of a cryptographic secure computation
protocol. The problem we consider is privately computing aggregate conversion rate of advertising campaigns.
This underlying functionality can be abstracted as Private
Intersection-Sum (PI-Sum) with Cardinality. In this setting
two parties hold datasets containing user identifiers, and one
of the parties additionally has an integer value associated
with each of its user identifiers. The parties want to learn
the number of identifiers they have in common and the sum
of the integer values associated with these users without
revealing any more information about their private inputs.
We identify the major properties and enabling factors
which make the deployment of a cryptographic protocol
possible, practical, and uniquely positioned as a solution for
the task at hand. We describe our deployment setting and
the most relevant efficiency measure, which in our setting is
communication overhead rather than computation. We also
present a monetary cost model that can be used as a unifying
cost measure and the computation model which reflect out
use-case: a low-priority batch computing.
We present three PI-Sum with cardinality protocols: our
currently deployed protocol, which relies on a Diffie-Hellman
style double masking, and two new protocols which leverage
more recent techniques for private set intersection (PSI) that
use Random Oblivious Transfer and encrypted Bloom filters.
We compare the later two protocol with our original solution
when instantiated with different additively homomorphic
encryption schemes. We implement our constructions and
compare their costs. We also compare with recent generic
approaches for computing on the intersection of two datasets
and show that our best protocol has monetary cost that is
20× less than the best known generic approach.
View details
Two-Sided Malicious Security for Private Intersection-Sum with Cardinality
Peihan Miao
Sarvar Patel
Advances in Cryptology – CRYPTO 2020 (2020), pp. 3-33
Preview abstract
Private intersection-sum with cardinality allows two parties, where each party holds a private set and one of the parties additionally holds a private integer value associated with each element in her set, to jointly compute the cardinality of the intersection of the two sets as well as the sum of the associated integer values for all the elements in the intersection, and nothing beyond that.
We present a new construction for private intersection sum with cardinality that provides malicious security with abort and guarantees that both parties receive the output upon successful completion of the protocol. A central building block for our constructions is a primitive called shuffled distributed oblivious PRF (DOPRF), which is a PRF that offers oblivious evaluation using a secret key shared between two parties, and in addition to this allows obliviously permuting the PRF outputs of several parallel oblivious evaluations. We present the first construction for shuffled DOPRF with malicious security. We further present several new sigma proof protocols for relations across Pedersen commitments, ElGamal encryptions, and Camenisch-Shoup encryptions that we use in our main construction, for which we develop new batching techniques to reduce communication.
We implement and evaluate the efficiency of our protocol and show that we can achieve communication cost that is only 4-5 times greater than the most efficient semi-honest protocol. When measuring monetary cost of executing the protocol in the cloud, our protocol is 25 times more expensive than the semi-honest protocol. Our construction also allows for different parameter regimes that enable trade-offs between communication and computation.
View details
Practical Secure Aggregation for Privacy-Preserving Machine Learning
Antonio Marcedone
Benjamin Kreuter
Sarvar Patel
Vladimir Ivanov
CCS (2017)
Preview abstract
We design a novel, communication-efficient, failure-robust protocol for secure aggregation of high-dimensional data. Our protocol allows a server to collect an aggregate of user-held data from mobile devices in a privacy-preserving manner, and can be used, for example, in a federated learning setting, to aggregate user-provided model updates for a deep neural network. We prove the security of our protocol in the honest-but-curious and malicious server settings, and show that privacy is preserved even if an arbitrarily chosen subset of users drop out at any time. We evaluate the efficiency of our protocol and show, by complexity analysis and a concrete implementation, that its runtime and communication overhead remain low even on large data sets and client pools. For 16-bit input values, our protocol offers 1.73× communication expansion for 2^10 users and 2^20-dimensional vectors, and 1.98× expansion for 2^14 users and 2^24-dimensional vectors.
View details
Practical Secure Aggregation for Federated Learning on User-Held Data
Vladimir Ivanov
Ben Kreuter
Antonio Marcedone
Sarvar Patel
NIPS Workshop on Private Multi-Party Machine Learning (2016)
Preview abstract
Secure Aggregation is a class of Secure Multi-Party Computation algorithms wherein a group of
mutually distrustful parties u ∈ U each hold a private value x_u and collaborate to compute an
aggregate value, such as the sum_{u∈U} x_u, without revealing to one another any information about
their private value except what is learnable from the aggregate value itself. In this work, we consider
training a deep neural network in the Federated Learning model, using distributed gradient descent
across user-held training data on mobile devices, wherein Secure Aggregation protects the privacy of
each user’s model gradient. We identify a combination of efficiency and robustness requirements
which, to the best of our knowledge, are unmet by existing algorithms in the literature. We proceed to
design a novel, communication-efficient Secure Aggregation protocol for high-dimensional data that
tolerates up to 1/3 users failing to complete the protocol. For 16-bit input values, our protocol offers
1.73x communication expansion for 2^10 users and 2^20-dimensional vectors, and 1.98x expansion
for 2^14 users and 2^24 dimensional vectors.
View details