Mariana Raykova

Mariana Raykova

I work in the areas of cryptography and security. I am interested in both theoretical work that develops new cryptographic tools and applied cryptography projects that aim to use and implement cryptographic protocols in systems in order to enhance their security properties. My research includes work in the areas of secure computation, oblivious data structures, zero knowledge and verifiable computation, obfuscation.

I received my PhD from the Computer Science Department of Columbia University and I was co-advised by Tal Malkin and Steve Bellovin. After my PhD I spent a year as a postdoc at the Cryptography Group at IBM Research Watson. I was a Research Scientist at the Computer Science Laboratory at SRI International between 2013 and 2015. Following that I was an Assistant Professor at the Department of Computer Science at Yale University between 2016 and 2018. I joined Google as a Research Scientist in 2019.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Secure Poisson Regression
    Mahimna Kelkar
    Phi Hung Le
    USENIX Security Symposium (2022) (to appear)
    Preview abstract We introduce the first construction for secure two-party computation of Poisson regression, which enables two parties who hold shares of the input samples to learn only the resulting Poisson model while protecting the privacy of the inputs. Our construction relies on new protocols for secure fixed-point exponentiation and correlated matrix multiplications. Our secure exponentiation construction avoids expensive bit decomposition and achieves orders of magnitude improvement in both online and offline costs over state of the art works. As a result, the dominant cost for our secure Poisson regression are matrix multiplications with one fixed matrix. We introduce a new technique, called correlated Beaver triples, which enables many such multiplications at the cost of roughly one matrix multiplication. This further brings down the cost of secure Poisson regression. We implement our constructions and show their extreme efficiency. In a LAN setting, our secure exponentiation for 20-bit fractional precision takes less than 0.07ms with a batch-size of 100,000. One iteration of secure Poisson regression on a dataset with 10, 000 samples with 1000 binary features needs about 65.82s in the offline phase, 55.14s in the online phase and 17MB total communication. For several real datasets this translates into training that takes seconds and only a couple of MB communication View details
    Preview abstract We consider the computation of sparse, (ε, ϑ)-differentially private~(DP) histograms in the two-server model of secure multi-party computation~(MPC), which has recently gained traction in the context of privacy-preserving measurements of aggregate user data. We introduce protocols that enable two semi-honest non-colluding servers to compute histograms over the data held by multiple users, while only learning a private view of the data. Our solution achieves the same asymptotic l∞-error of O(log(1/ϑ)/ε) as in the central model of DP, but without relying on a trusted curator. The server communication and computation costs of our protocol are independent of the number of histogram buckets, and are linear in the number of users, while the client cost is independent of the number of users, ε, and ϑ. Its linear dependence on the number of users lets our protocol scale well, which we confirm using microbenchmarks: for a billion users, ε = 0.5, and ϑ = 10-11, the per-user cost of our protocol is only 1.08 ms of server computation and 339 bytes of communication. In contrast, a baseline protocol using garbled circuits only allows up to 106 users, where it requires 600 KB communication per user. View details
    Communication–Computation Trade-offs in PIR
    Asra Ali
    Tancrède Lepoint
    Sarvar Patel
    Phillipp Schoppmann
    Kevin Yeo
    30th USENIX Security Symposium (2021)
    Preview abstract We study the computation and communication costs and their possible trade-offs in various constructions for private information retrieval (PIR), including schemes based on homomorphic encryption and the Gentry–Ramzan PIR (ICALP'05). We improve over the construction of SealPIR (S&P'18) using compression techniques and a new oblivious expansion, which reduce the communication bandwidth by 80% while preserving essentially the same computation cost. We then present MulPIR, a PIR protocol additionally leveraging multiplicative homomorphism to implement the recursion steps in PIR. While using the multiplicative homomorphism has been considered in prior work, we observe that in combination with our other techniques, it introduces a meaningful tradeoff by significantly reducing communication, at the cost of an increased computational cost for the server, when the databases have large entries. For some applications, we show that this could reduce the total monetary server cost by up to 35%. On the other end of the communication–computation spectrum, we take a closer look at Gentry–Ramzan PIR, a scheme with asymptotically optimal communication rate. Here, the bottleneck is the server's computation, which we manage to reduce significantly. Our optimizations enable a tunable tradeoff between communication and computation, which allows us to reduce server computation by as much as 85%, at the cost of an increased query size. Finally, we introduce new ways to handle PIR over sparse databases (keyword PIR), based on different hashing techniques. We implement all of our constructions, and compare their communication and computation overheads with respect to each other for several application scenarios. View details
    Private Join and Compute from PIR with Default
    Tancrède Lepoint
    Sarvar Patel
    Ni Trieu
    Asiacrypt 2021 (2021)
    Preview abstract The private join and compute (PJC) functionality enables secure computation over data distributed across different databases, and is applicable to a wide range of applications, many of which address settings where the input databases are of significantly different sizes. We introduce the notion of private information retrieval (PIR) with default, which enables two-party PJC functionalities in a way that hides the size of the intersection of the two databases and incurs sublinear communication cost in the size of the bigger database. We provide two constructions for this functionality, one of which requires offline linear communication, which can be amortized across queries, and one that provides sublinear cost for each query but relies on more computationally expensive tools. We construct inner-product PJC, which has applications to ads conversion measurement and contact tracing, relying on an extension of PIR with default. We evaluate the efficiency of our constructions, which can enable 28 PIR with default lookups on a database of size 2^25 (or inner-product PJC on databases with such sizes) with the communication of 44 MB, which costs less than 0.17 c. for the client and 26.48 c. for the server. View details
    Private Intersection-Sum Protocols with Applications to Attributing Aggregate Ad Conversions
    Mihaela Ion
    Benjamin Kreuter
    Erhan Nergiz
    Sarvar Patel
    Shobhit Saxena
    David Shanahan
    2020 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 370-389
    Preview abstract In this work, we discuss our successful efforts for industry deployment of a cryptographic secure computation protocol. The problem we consider is privately computing aggregate conversion rate of advertising campaigns. This underlying functionality can be abstracted as Private Intersection-Sum (PI-Sum) with Cardinality. In this setting two parties hold datasets containing user identifiers, and one of the parties additionally has an integer value associated with each of its user identifiers. The parties want to learn the number of identifiers they have in common and the sum of the integer values associated with these users without revealing any more information about their private inputs. We identify the major properties and enabling factors which make the deployment of a cryptographic protocol possible, practical, and uniquely positioned as a solution for the task at hand. We describe our deployment setting and the most relevant efficiency measure, which in our setting is communication overhead rather than computation. We also present a monetary cost model that can be used as a unifying cost measure and the computation model which reflect out use-case: a low-priority batch computing. We present three PI-Sum with cardinality protocols: our currently deployed protocol, which relies on a Diffie-Hellman style double masking, and two new protocols which leverage more recent techniques for private set intersection (PSI) that use Random Oblivious Transfer and encrypted Bloom filters. We compare the later two protocol with our original solution when instantiated with different additively homomorphic encryption schemes. We implement our constructions and compare their costs. We also compare with recent generic approaches for computing on the intersection of two datasets and show that our best protocol has monetary cost that is 20× less than the best known generic approach. View details
    Preview abstract Secure aggregation is a cryptographic primitive that enables a server to learn the sum of the vector inputs of many clients. Bonawitz et al. (CCS 2017) presented a construction that incurs computation and communication for each client linear in the number of parties. While this functionality enables a broad range of privacy preserving computational tasks, scaling concerns limit its scope of use. We present the first constructions for secure aggregation that achieve polylogarithmic communication and computation per client. Our constructions provide security in the semi-honest and the semi-malicious setting where the adversary controls the server and a γ-fraction of the clients, and correctness with up to δ-fraction dropouts among the clients. Our constructions show how to replace the complete communication graph of Bonawitz et al., which entails the linear overheads, with a k-regular graph of logarithmic degree while maintaining the security guarantees. Beyond improving the known asymptotics for secure aggregation, our constructions also achieve very efficient concrete parameters. The semi-honest secure aggregation can handle a billion clients at the per client cost of the protocol of Bonawitz et al. for a thousand clients. In the semi-malicious setting with 104 clients, each client needs to communicate only with 3% of the clients to have a guarantee that its input has been added together with the inputs of at least 5000 other clients, while withstanding up to 5% corrupt clients and 5% dropouts. We also show an application of secure aggregation to the task of secure shuffling which enables the first cryptographically secure instantiation of the shuffle model of differential privacy. View details
    Two-Sided Malicious Security for Private Intersection-Sum with Cardinality
    Peihan Miao
    Sarvar Patel
    Advances in Cryptology – CRYPTO 2020 (2020), pp. 3-33
    Preview abstract Private intersection-sum with cardinality allows two parties, where each party holds a private set and one of the parties additionally holds a private integer value associated with each element in her set, to jointly compute the cardinality of the intersection of the two sets as well as the sum of the associated integer values for all the elements in the intersection, and nothing beyond that. We present a new construction for private intersection sum with cardinality that provides malicious security with abort and guarantees that both parties receive the output upon successful completion of the protocol. A central building block for our constructions is a primitive called shuffled distributed oblivious PRF (DOPRF), which is a PRF that offers oblivious evaluation using a secret key shared between two parties, and in addition to this allows obliviously permuting the PRF outputs of several parallel oblivious evaluations. We present the first construction for shuffled DOPRF with malicious security. We further present several new sigma proof protocols for relations across Pedersen commitments, ElGamal encryptions, and Camenisch-Shoup encryptions that we use in our main construction, for which we develop new batching techniques to reduce communication. We implement and evaluate the efficiency of our protocol and show that we can achieve communication cost that is only 4-5 times greater than the most efficient semi-honest protocol. When measuring monetary cost of executing the protocol in the cloud, our protocol is 25 times more expensive than the semi-honest protocol. Our construction also allows for different parameter regimes that enable trade-offs between communication and computation. View details
    Advances and Open Problems in Federated Learning
    Brendan Avent
    Aurélien Bellet
    Mehdi Bennis
    Arjun Nitin Bhagoji
    Graham Cormode
    Rachel Cummings
    Rafael G.L. D'Oliveira
    Salim El Rouayheb
    David Evans
    Josh Gardner
    Adrià Gascón
    Phillip B. Gibbons
    Marco Gruteser
    Zaid Harchaoui
    Chaoyang He
    Lie He
    Zhouyuan Huo
    Justin Hsu
    Martin Jaggi
    Tara Javidi
    Gauri Joshi
    Mikhail Khodak
    Jakub Konečný
    Aleksandra Korolova
    Farinaz Koushanfar
    Sanmi Koyejo
    Tancrède Lepoint
    Yang Liu
    Prateek Mittal
    Richard Nock
    Ayfer Özgür
    Rasmus Pagh
    Ramesh Raskar
    Dawn Song
    Weikang Song
    Sebastian U. Stich
    Ziteng Sun
    Florian Tramèr
    Praneeth Vepakomma
    Jianyu Wang
    Li Xiong
    Qiang Yang
    Felix X. Yu
    Han Yu
    Arxiv (2019)
    Preview abstract Federated learning (FL) is a machine learning setting where many clients (e.g., mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g., service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and mitigates many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents a comprehensive list of open problems and challenges. View details