An Zhu
An Zhu is currently a Staff Software Engineer at Google. Her home page is at: http://www.cs.stanford.edu/~anzhu.
Research Areas
Authored Publications
Google Publications
Other Publications
Sort By
Achieving anonymity via clustering
Tomás Feder
Krishnaram Kenthapadi
Samir Khuller
Rina Panigrahy
Dilys Thomas
ACM Transactions on Algorithms, vol. 6 (2010), 49:1-49:19
Preview abstract
Publishing data for analysis from a table containing personal records, while maintaining individual privacy, is a problem of increasing importance today. The traditional approach of
de-identifying records is to remove identifying fields such as social security number, name etc. However, recent research has shown that a large fraction of the US population can be
identified using non-key attributes (called quasi-identifiers) such as date of birth, gender, and zip code. Sweeney proposed the k-anonymity model for privacy where non-key
attributes that leak information are suppressed or generalized so that, for every record in the modified table, there are at least k−1 other records having exactly the same values for
quasi-identifiers. We propose a new method for anonymizing data records, where quasi-identifiers of data records are first clustered and then cluster centers are published. To
ensure privacy of the data records, we impose the constraint that each cluster must contain no fewer than a pre-specified number of data records. This technique is more general since we have a much larger choice for cluster centers than k-Anonymity. In many cases, it lets us release a lot more information without compromising privacy. We also provide constant-factor approximation algorithms to come up with such a clustering. This is the first set of algorithms for the anonymization problem where the performance is independent of the anonymity parameter k. We further observe that a few outlier points can significantly increase the cost of anonymization. Hence, we extend our algorithms to allow an epsilon fraction of points to remain unclustered, i.e., deleted from the anonymized publication. Thus, by not releasing a small fraction of the database records, we can ensure that the data published for analysis has less distortion and hence is more useful. Our approximation algorithms for new clustering objectives are of independent interest and could be applicable in other clustering scenarios as well.
View details
Asymptotic Polynomial-Time Approximation Schemes. Handbook of Approximation Algorithms and Metaheuristics.
Rajeev Motwani
Liadan O'Callaghan
Handbook of Approximation Algorithms and Metaheuristics, Teofilo Gonzalez, ed., Chapman and Hall/CRC (2007)
Preview abstract
Approximation schemes are presented for BIN
PACKING, including a PAS due to Vega and Lueker, and an FPAS due to
Karmakar and Karp. It is shown that the latter can be modified into an
approximation algorithm whose absolute error is bounded by a
poly-logarithmic function in the value of the optimal solution.
View details
Achieving Anonymity via Clustering
Preview
Tomás Feder
Krishnaram Kenthapadi
Samir Khuller
Rina Panigrahy
Dilys Thomas
Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS) (2006), pp. 153-162
The load rebalancing problem
Anonymizing Tables
Tomás Feder
Krishnaram Kenthapadi
Rajeev Motwani
Rina Panigrahy
Dilys Thomas
ICDT (2005), pp. 246-258
Finding Longest Increasing and Common Subsequences in Streaming Data
Algorithms for the Database Layout Problem
Algorithms for Multi-product Pricing
Combining request scheduling with web caching
Tomás Feder
Rajeev Motwani
Rina Panigrahy
Steven S. Seiden
Rob van Stee
Theor. Comput. Sci., vol. 324 (2004), pp. 201-218
Modeling correlations in web traces and implications for designing replacement policies
Konstantinos Psounis
Balaji Prabhakar
Rajeev Motwani
Computer Networks, vol. 45 (2004), pp. 379-398
Analysis of queueing policies in QoS switches
J. Algorithms, vol. 53 (2004), pp. 137-168
Discrete Mobile Centers
Jie Gao
Leonidas J. Guibas
John Hershberger
Li Zhang
Discrete & Computational Geometry, vol. 30 (2003), pp. 45-63
Switch Scheduling via Randomized Edge Coloring
The load rebalancing problem
Competitive queueing policies for QoS switches
The General Steiner Tree-Star problem
Web caching with request reordering
Algorithms for minimizing weighted flow time
Geometric spanner for routing in mobile networks
Approximation algorithms for data placement on parallel disks
A Uniform Framework for Approximating Weighted Connectivity Problems