Pasin Manurangsi
Authored Publications
Sort By
Preview abstract
We study the allocation of indivisible goods under conflicting constraints, represented by a graph. In this framework, vertices correspond to goods and edges correspond to conflicts between a pair of goods. Each agent is allocated an independent set in the graph. In a recent work of [Kumar et al., 2024], it was shown that a maximal EF1 allocation exists for interval graphs and two agents with monotone valuations. We significantly extend this result by establishing that a maximal EF1 allocation exists for *any graph* when the two agents have monotone valuations. To compute such an allocation, we present a polynomial-time algorithm for additive valuations as well as a pseudo-polynomial time algorithm for monotone valuations. Moreover, we complement our findings by providing a counter example demonstrating a maximal EF1 allocation may not exist for three agents with monotone valuations. Additionally, we establish NP-hardness of determining the existence of such allocations for every fixed number n of agents.
View details
Improved FPT Approximation Scheme and Approximate Kernel for Biclique-Free Max k-Weight SAT: Greedy Strikes Back
Theoretical Computer Science, 1028 (2025)
Preview abstract
In the Max k-Weight SAT (aka Max SAT with Cardinality Constraint) problem, we are given a CNF formula with n variables and m clauses together with a positive integer k. The goal is to find an assignment where at most k variables are set to one that satisfies as many constraints as possible. Recently, Jain et al. (SODA 2023) gave an FPT approximation scheme (FPT-AS) with running time 2^O((dk/ε)^d) * (n + m)^O(1) for Max k-Weight SAT when the incidence graph is K_{d,d}-free. They asked whether a polynomial-size approximate kernel exists. In this work, we answer this question positively by giving an (1 − ε)-approximate kernel with (dk/ε)^O(d) variables. This also implies an improved FPT-AS with running time (dk/ε)^O(dk) * (n+m)^O(1)-time algorithm for the problem. Our approximate kernel is based mainly on a couple of greedy strategies together with a sunflower lemma-style reduction rule.
View details
Improved Lower Bound for Differentially Private Facility Location
Information Processing Letters, 187 (2025)
Preview abstract
We consider the differentially private (DP) facility location problem in the so called super-set output setting proposed by Gupta et al. [GLM+10]. The current best known expected approximation ratio for an ε-DP algorithm is O(log n / √ε) due to Cohen-Addad et al. [CEF+22] where n denote the size of the metric space, meanwhile the best known lower bound is Ω(1/√ε) [EGLW19].
In this short note, we give a lower bound of Ω(min{log n, √(log n/ε)}) on the expected approximation ratio of any ε-DP algorithm, which is the first evidence that the approximation ratio has to grow with the size of the metric space.
View details
Preview abstract
We study differential privacy (DP) in a multi-party setting where each party only trusts a (known) subset of the other parties with its data. Specifically, given a trust graph where vertices correspond to parties and neighbors are mutually trusting, we give a DP algorithm for aggregation with a much better privacy-utility trade-off than in the well-studied local model of DP (where each party trusts no other party). We further study a robust variant where each party trusts all but an unknown subset of at most t of its neighbors (where t is a given parameter), and give an algorithm for this setting. We complement our algorithms with lower bounds, and discuss implications of our work to other tasks in private learning and analytics.
View details
Differentially Private Insights into AI Use
Daogao Liu
Pritish Kamath
Alexander Knop
Adam Sealfon
Da Yu
Chiyuan Zhang
Conference on Language Modeling (COLM) 2025 (2025)
Preview abstract
We introduce Urania, a novel framework for generating insights about LLM chatbot interactions with rigorous differential privacy (DP) guarantees. The framework employs a private clustering mechanism and innovative keyword extraction methods, including frequency-based, TF-IDF-based, and LLM-guided approaches. By leveraging DP tools such as clustering, partition selection, and histogram-based summarization, Urania provides end-to-end privacy protection. Our evaluation assesses lexical and semantic content preservation, pair similarity, and LLM-based metrics, benchmarking against a non-private method inspired by CLIO (Tamkin et al., 2024). Moreover, we develop a simple empirical privacy evaluation that demonstrates the enhanced robustness of our DP pipeline. The results show the framework’s ability to extract meaningful conversational insights while maintaining stringent user privacy, effectively balancing data utility with privacy preservation.
View details
Preview abstract
Differential privacy can be achieved in a distributed manner, where multiple parties add independent noise such that their sum protects the overall dataset with differential privacy. A common technique here is for each party to sample their noise from the decomposition of an infinitely divisible distribution. We introduce two novel mechanisms in this setting: 1) the generalized discrete Laplace (GDL) mechanism, whose distribution (which is closed under summation) follows from differences of i.i.d. negative binomial shares, and 2) The multi-scale discrete Laplace (MSDLap) mechanism, which follows the sum of multiple i.i.d. discrete Laplace shares at different scales. The mechanisms can be parameterized to have 𝑂(Δ^3𝑒^{−𝜀}) and 𝑂 (min(Δ^3𝑒^{−𝜀}, Δ^2𝑒^{−2𝜀/3})) MSE, respectively, where the latter bound matches known optimality results. Furthermore, the MSDLap mechanism has the optimal MSE including constants as 𝜀 → ∞. We also show a transformation from the discrete setting to the continuous setting, which allows us to transform both mechanisms to the continuous setting and thereby achieve the optimal 𝑂 (Δ^2𝑒^{−2𝜀/3}) MSE. To our knowledge, these are the first infinitely divisible additive noise mechanisms that achieve order-optimal MSE under pure differential privacy for either the discrete or continuous setting, so our work shows formally there is no separation in utility when query-independent noise adding mechanisms are restricted to infinitely divisible noise. For the continuous setting, our result improves upon Pagh and Stausholm’s Arete distribution which gives an MSE of 𝑂(Δ^2𝑒^{−𝜀/4}) [35]. We apply our results to improve a state of the art multi-message shuffle DP protocol from [3] in the high 𝜀 regime.
View details
Balls-and-Bins Sampling for DP-SGD
Lynn Chua
Charlie Harrison
Pritish Kamath
Ethan Leeman
Amer Sinha
Chiyuan Zhang
AISTATS (2025)
Preview abstract
We introduce the Balls-and-Bins sampling for differentially private (DP) optimization methods such as DP-SGD. While it has been common practice to use some form of shuffling in DP-SGD implementations, privacy accounting algorithms have typically assumed that Poisson subsampling is used instead. Recent work by Chua et al. (2024) however pointed out that shuffling based DP-SGD can have a much larger privacy cost in practical regime of parameters. We show that the Balls-and-Bins sampling achieves the “best-of-both” samplers, namely, the implementation of Balls-and-Bins sampling is similar to that of Shuffling and models trained with Balls-and-Bins based DP-SGD achieve utility comparable to those trained with Shuffle based DP-SGD at the same noise multiplier, and yet, Balls-and-Bins sampling enjoys similar-or-better privacy amplification as compared to Poisson subsampling.
View details
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models
Lynn Chua
Yangsibo Huang
Pritish Kamath
Amer Sinha
Chulin Xie
Chiyuan Zhang
COLM (2025)
Preview abstract
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora. But can these models relate corresponding concepts across languages, i.e., be crosslingual? This study evaluates state-of-the-art LLMs on inherently crosslingual tasks. We observe that while these models show promising surface-level crosslingual abilities on machine translation and embedding space analyses, they struggle with deeper crosslingual knowledge transfer, revealing a crosslingual knowledge barrier in both general (MMLU benchmark) and domain-specific (Harry Potter quiz and TOFU benchmark) contexts. Since simple inference-time mitigation methods offer only limited improvement, we propose fine-tuning of LLMs on mixed-language data, which effectively reduces these gaps, even when using out-of-domain datasets like WikiText. Our findings suggest the need for explicit optimization to unlock the full crosslingual potential of LLMs. Our code is available at https://github.com/google-research/crosslingual-knowledge-barriers.
View details
Preview abstract
Several resource allocation settings involve agents with unequal entitlements represented by weights. We analyze weighted fair division from an asymptotic perspective: if m items are divided among n agents whose utilities are independently sampled from a probability distribution, when is it likely that a fair allocation exist? We show that if the ratio between the weights is bounded, a weighted envy-free allocation exists with high probability provided that m = Ω(n log n/ log log n), generalizing a prior unweighted result. For weighted proportionality, we establish a sharp threshold of m = n/(1 − μ) for the transition from non-existence to existence, where μ ∈ (0, 1) denotes the mean of the distribution. In addition, we prove that for two agents, a weighted envy-free (and weighted proportional) allocation is likely to exist if m = ω(√r), where r denotes the ratio between the two weights.
View details
On the Differential Privacy and Interactivity of Privacy Sandbox Reports
Charlie Harrison
Pritish Kamath
Alexander Knop
Ethan Leeman
Vikas Sahu
PETS (2025)
Preview abstract
The Privacy Sandbox initiative from Google includes APIs for enabling privacy-preserving advertising functionalities as part of the effort to limit third-party cookies. In particular, the Private Aggregation API (PAA) and the Attribution Reporting API (ARA) can be used for ad measurement while providing different guardrails for safeguarding user privacy, including a framework for satisfying differential privacy (DP). In this work, we provide an abstract model for analyzing the privacy of these APIs and show that they satisfy a formal DP guarantee under certain assumptions. Our analysis handles the case where both the queries and database can change interactively based on previous responses from the API.
View details