Karn Seth

Karn Seth

I'm a Software Engineer in the Privacy and Data Protection Office (PDPO), where I work mainly on various flavors of Private Join and Compute, and am a primary maintainer of the associated open source repository. More broadly, I'm very interested in the use of Secure Multiparty Computation for privacy preserving measurement, analytics and machine learning, with a special focus on practical deployment.

In the past, I've worked with folks on Federated Machine Learning and Analytics, including the development of the Secure Aggregation protocol, which was featured in the Federated Learning Comic.

I've had the pleasure of hosting/co-hosting an incredible group of summer interns:

I completed my PhD in 2016 under the supervision of Dr. Rafael Pass. Here's a link to my External Google Scholar Profile.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Secure Poisson Regression
    Mahimna Kelkar
    Phi Hung Le
    USENIX Security Symposium (2022) (to appear)
    Preview abstract We introduce the first construction for secure two-party computation of Poisson regression, which enables two parties who hold shares of the input samples to learn only the resulting Poisson model while protecting the privacy of the inputs. Our construction relies on new protocols for secure fixed-point exponentiation and correlated matrix multiplications. Our secure exponentiation construction avoids expensive bit decomposition and achieves orders of magnitude improvement in both online and offline costs over state of the art works. As a result, the dominant cost for our secure Poisson regression are matrix multiplications with one fixed matrix. We introduce a new technique, called correlated Beaver triples, which enables many such multiplications at the cost of roughly one matrix multiplication. This further brings down the cost of secure Poisson regression. We implement our constructions and show their extreme efficiency. In a LAN setting, our secure exponentiation for 20-bit fractional precision takes less than 0.07ms with a batch-size of 100,000. One iteration of secure Poisson regression on a dataset with 10, 000 samples with 1000 binary features needs about 65.82s in the offline phase, 55.14s in the online phase and 17MB total communication. For several real datasets this translates into training that takes seconds and only a couple of MB communication View details
    Private Join and Compute from PIR with Default
    Tancrède Lepoint
    Sarvar Patel
    Ni Trieu
    Asiacrypt 2021 (2021)
    Preview abstract The private join and compute (PJC) functionality enables secure computation over data distributed across different databases, and is applicable to a wide range of applications, many of which address settings where the input databases are of significantly different sizes. We introduce the notion of private information retrieval (PIR) with default, which enables two-party PJC functionalities in a way that hides the size of the intersection of the two databases and incurs sublinear communication cost in the size of the bigger database. We provide two constructions for this functionality, one of which requires offline linear communication, which can be amortized across queries, and one that provides sublinear cost for each query but relies on more computationally expensive tools. We construct inner-product PJC, which has applications to ads conversion measurement and contact tracing, relying on an extension of PIR with default. We evaluate the efficiency of our constructions, which can enable 28 PIR with default lookups on a database of size 2^25 (or inner-product PJC on databases with such sizes) with the communication of 44 MB, which costs less than 0.17 c. for the client and 26.48 c. for the server. View details
    Preview abstract This document describes a secure mechanism to join sets of heterogeneous ids from multiple data providers (e.g. broadcasters, publishers, data enrichment providers) and create a set of encrypted common identifiers, which we refer to as SUMIDs. These identifiers can be used for computing multi-party reach, frequency, sales lift, MTA, and other ads metrics. We also introduce the concept of “match rules”, which dictate when two heterogeneous IDs should be assigned the same SUMID within the secure mechanism. We avoid prescribing specific match rules as these could vary depending upon a number of considerations, such as the specific media market or whether the SUMID is intended to represent a household or an individual. Optimal choice of match rules are also an open area of research. View details
    Communication–Computation Trade-offs in PIR
    Asra Ali
    Tancrède Lepoint
    Sarvar Patel
    Phillipp Schoppmann
    Kevin Yeo
    30th USENIX Security Symposium (2021)
    Preview abstract We study the computation and communication costs and their possible trade-offs in various constructions for private information retrieval (PIR), including schemes based on homomorphic encryption and the Gentry–Ramzan PIR (ICALP'05). We improve over the construction of SealPIR (S&P'18) using compression techniques and a new oblivious expansion, which reduce the communication bandwidth by 80% while preserving essentially the same computation cost. We then present MulPIR, a PIR protocol additionally leveraging multiplicative homomorphism to implement the recursion steps in PIR. While using the multiplicative homomorphism has been considered in prior work, we observe that in combination with our other techniques, it introduces a meaningful tradeoff by significantly reducing communication, at the cost of an increased computational cost for the server, when the databases have large entries. For some applications, we show that this could reduce the total monetary server cost by up to 35%. On the other end of the communication–computation spectrum, we take a closer look at Gentry–Ramzan PIR, a scheme with asymptotically optimal communication rate. Here, the bottleneck is the server's computation, which we manage to reduce significantly. Our optimizations enable a tunable tradeoff between communication and computation, which allows us to reduce server computation by as much as 85%, at the cost of an increased query size. Finally, we introduce new ways to handle PIR over sparse databases (keyword PIR), based on different hashing techniques. We implement all of our constructions, and compare their communication and computation overheads with respect to each other for several application scenarios. View details
    Private Intersection-Sum Protocols with Applications to Attributing Aggregate Ad Conversions
    Mihaela Ion
    Benjamin Kreuter
    Erhan Nergiz
    Sarvar Patel
    Shobhit Saxena
    David Shanahan
    2020 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 370-389
    Preview abstract In this work, we discuss our successful efforts for industry deployment of a cryptographic secure computation protocol. The problem we consider is privately computing aggregate conversion rate of advertising campaigns. This underlying functionality can be abstracted as Private Intersection-Sum (PI-Sum) with Cardinality. In this setting two parties hold datasets containing user identifiers, and one of the parties additionally has an integer value associated with each of its user identifiers. The parties want to learn the number of identifiers they have in common and the sum of the integer values associated with these users without revealing any more information about their private inputs. We identify the major properties and enabling factors which make the deployment of a cryptographic protocol possible, practical, and uniquely positioned as a solution for the task at hand. We describe our deployment setting and the most relevant efficiency measure, which in our setting is communication overhead rather than computation. We also present a monetary cost model that can be used as a unifying cost measure and the computation model which reflect out use-case: a low-priority batch computing. We present three PI-Sum with cardinality protocols: our currently deployed protocol, which relies on a Diffie-Hellman style double masking, and two new protocols which leverage more recent techniques for private set intersection (PSI) that use Random Oblivious Transfer and encrypted Bloom filters. We compare the later two protocol with our original solution when instantiated with different additively homomorphic encryption schemes. We implement our constructions and compare their costs. We also compare with recent generic approaches for computing on the intersection of two datasets and show that our best protocol has monetary cost that is 20× less than the best known generic approach. View details
    Two-Sided Malicious Security for Private Intersection-Sum with Cardinality
    Peihan Miao
    Sarvar Patel
    Advances in Cryptology – CRYPTO 2020 (2020), pp. 3-33
    Preview abstract Private intersection-sum with cardinality allows two parties, where each party holds a private set and one of the parties additionally holds a private integer value associated with each element in her set, to jointly compute the cardinality of the intersection of the two sets as well as the sum of the associated integer values for all the elements in the intersection, and nothing beyond that. We present a new construction for private intersection sum with cardinality that provides malicious security with abort and guarantees that both parties receive the output upon successful completion of the protocol. A central building block for our constructions is a primitive called shuffled distributed oblivious PRF (DOPRF), which is a PRF that offers oblivious evaluation using a secret key shared between two parties, and in addition to this allows obliviously permuting the PRF outputs of several parallel oblivious evaluations. We present the first construction for shuffled DOPRF with malicious security. We further present several new sigma proof protocols for relations across Pedersen commitments, ElGamal encryptions, and Camenisch-Shoup encryptions that we use in our main construction, for which we develop new batching techniques to reduce communication. We implement and evaluate the efficiency of our protocol and show that we can achieve communication cost that is only 4-5 times greater than the most efficient semi-honest protocol. When measuring monetary cost of executing the protocol in the cloud, our protocol is 25 times more expensive than the semi-honest protocol. Our construction also allows for different parameter regimes that enable trade-offs between communication and computation. View details
    Preview abstract We design a novel, communication-efficient, failure-robust protocol for secure aggregation of high-dimensional data. Our protocol allows a server to collect an aggregate of user-held data from mobile devices in a privacy-preserving manner, and can be used, for example, in a federated learning setting, to aggregate user-provided model updates for a deep neural network. We prove the security of our protocol in the honest-but-curious and malicious server settings, and show that privacy is preserved even if an arbitrarily chosen subset of users drop out at any time. We evaluate the efficiency of our protocol and show, by complexity analysis and a concrete implementation, that its runtime and communication overhead remain low even on large data sets and client pools. For 16-bit input values, our protocol offers 1.73× communication expansion for 2^10 users and 2^20-dimensional vectors, and 1.98× expansion for 2^14 users and 2^24-dimensional vectors. View details
    Practical Secure Aggregation for Federated Learning on User-Held Data
    Vladimir Ivanov
    Ben Kreuter
    Antonio Marcedone
    Sarvar Patel
    NIPS Workshop on Private Multi-Party Machine Learning (2016)
    Preview abstract Secure Aggregation is a class of Secure Multi-Party Computation algorithms wherein a group of mutually distrustful parties u ∈ U each hold a private value x_u and collaborate to compute an aggregate value, such as the sum_{u∈U} x_u, without revealing to one another any information about their private value except what is learnable from the aggregate value itself. In this work, we consider training a deep neural network in the Federated Learning model, using distributed gradient descent across user-held training data on mobile devices, wherein Secure Aggregation protects the privacy of each user’s model gradient. We identify a combination of efficiency and robustness requirements which, to the best of our knowledge, are unmet by existing algorithms in the literature. We proceed to design a novel, communication-efficient Secure Aggregation protocol for high-dimensional data that tolerates up to 1/3 users failing to complete the protocol. For 16-bit input values, our protocol offers 1.73x communication expansion for 2^10 users and 2^20-dimensional vectors, and 1.98x expansion for 2^14 users and 2^24 dimensional vectors. View details