An Zhu

An Zhu

An Zhu is currently a Staff Software Engineer at Google. Her home page is at: http://www.cs.stanford.edu/~anzhu.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Achieving anonymity via clustering
    Tomás Feder
    Krishnaram Kenthapadi
    Samir Khuller
    Rina Panigrahy
    Dilys Thomas
    ACM Transactions on Algorithms, 6(2010), 49:1-49:19
    Preview abstract Publishing data for analysis from a table containing personal records, while maintaining individual privacy, is a problem of increasing importance today. The traditional approach of de-identifying records is to remove identifying fields such as social security number, name etc. However, recent research has shown that a large fraction of the US population can be identified using non-key attributes (called quasi-identifiers) such as date of birth, gender, and zip code. Sweeney proposed the k-anonymity model for privacy where non-key attributes that leak information are suppressed or generalized so that, for every record in the modified table, there are at least k−1 other records having exactly the same values for quasi-identifiers. We propose a new method for anonymizing data records, where quasi-identifiers of data records are first clustered and then cluster centers are published. To ensure privacy of the data records, we impose the constraint that each cluster must contain no fewer than a pre-specified number of data records. This technique is more general since we have a much larger choice for cluster centers than k-Anonymity. In many cases, it lets us release a lot more information without compromising privacy. We also provide constant-factor approximation algorithms to come up with such a clustering. This is the first set of algorithms for the anonymization problem where the performance is independent of the anonymity parameter k. We further observe that a few outlier points can significantly increase the cost of anonymization. Hence, we extend our algorithms to allow an epsilon fraction of points to remain unclustered, i.e., deleted from the anonymized publication. Thus, by not releasing a small fraction of the database records, we can ensure that the data published for analysis has less distortion and hence is more useful. Our approximation algorithms for new clustering objectives are of independent interest and could be applicable in other clustering scenarios as well. View details
    Asymptotic Polynomial-Time Approximation Schemes. Handbook of Approximation Algorithms and Metaheuristics.
    Rajeev Motwani
    Liadan O'Callaghan
    Handbook of Approximation Algorithms and Metaheuristics, Teofilo Gonzalez, ed., Chapman and Hall/CRC(2007)
    Preview abstract Approximation schemes are presented for BIN PACKING, including a PAS due to Vega and Lueker, and an FPAS due to Karmakar and Karp. It is shown that the latter can be modified into an approximation algorithm whose absolute error is bounded by a poly-logarithmic function in the value of the optimal solution. View details
    Achieving Anonymity via Clustering
    Tomás Feder
    Krishnaram Kenthapadi
    Samir Khuller
    Rina Panigrahy
    Dilys Thomas
    Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS)(2006), pp. 153-162
    Preview
    The load rebalancing problem
    Rajeev Motwani
    J. Algorithms, 60(2006), pp. 42-59
    Preview abstract In the classical load balancing or multiprocessor scheduling problem, we are given a sequence of jobs of varying sizes and are asked to assign each job to one of the m empty processors. A typical objective is to minimize the makespan, which is the load on the heaviest loaded processor. Since in most real world scenarios the load is a dynamic measure, the initial assignment may not remain optimal over time. Motivated by such considerations in a variety of systems, we formulate the problem of load rebalancing—given a possibly suboptimal assignment of jobs to processors, relocate a set of the jobs so as to decrease the makespan. Specifically, the goal is to achieve the best possible makespan under the constraint that no more than k jobs are relocated. We also consider the weighted version of this problem where there is an arbitrary cost associated with each job's relocation. The problem is NP-hard and hence, we focus on approximation algorithms. We construct an algorithm which achieves a 1.5-approximation, with near linear running time. We also show that the problem has a PTAS, thereby resolving the complexity issue. Finally, we investigate the approximability of several extensions of the load rebalancing model. View details
    Algorithms for the Database Layout Problem
    Tomás Feder
    Rajeev Motwani
    Rina Panigrahy
    ICDT(2005), pp. 189-203
    Finding Longest Increasing and Common Subsequences in Streaming Data
    David Liben-Nowell
    Erik Vee
    COCOON(2005), pp. 263-272
    Anonymizing Tables
    Tomás Feder
    Krishnaram Kenthapadi
    Rajeev Motwani
    Rina Panigrahy
    Dilys Thomas
    ICDT(2005), pp. 246-258
    Algorithms for Multi-product Pricing
    Tomás Feder
    Rajeev Motwani
    ICALP(2004), pp. 72-83
    Analysis of queueing policies in QoS switches
    J. Algorithms, 53(2004), pp. 137-168
    Combining request scheduling with web caching
    Tomás Feder
    Rajeev Motwani
    Rina Panigrahy
    Steven S. Seiden
    Rob van Stee
    Theor. Comput. Sci., 324(2004), pp. 201-218
    Modeling correlations in web traces and implications for designing replacement policies
    Konstantinos Psounis
    Balaji Prabhakar
    Rajeev Motwani
    Computer Networks, 45(2004), pp. 379-398
    Competitive queueing policies for QoS switches
    Nir Andelman
    Yishay Mansour
    SODA(2003), pp. 761-770
    Discrete Mobile Centers
    Jie Gao
    Leonidas J. Guibas
    John Hershberger
    Li Zhang
    Discrete & Computational Geometry, 30(2003), pp. 45-63
    The load rebalancing problem
    Rajeev Motwani
    SPAA(2003)
    Switch Scheduling via Randomized Edge Coloring
    Rajeev Motwani
    Devavrat Shah
    FOCS(2003)
    Web caching with request reordering
    Tom
    Rajeev Motwani
    Rina Panigrahy
    SODA(2002), pp. 104-105
    The General Steiner Tree-Star problem
    Samir Khuller
    Inf. Process. Lett., 84(2002), pp. 215-220
    Geometric spanner for routing in mobile networks
    Jie Gao
    Leonidas J. Guibas
    John Hershberger
    Li Zhang
    MobiHoc(2001), pp. 45-55
    Algorithms for minimizing weighted flow time
    Chandra Chekuri
    Sanjeev Khanna
    STOC(2001), pp. 84-93
    Approximation algorithms for data placement on parallel disks
    Leana Golubchik
    Sanjeev Khanna
    Samir Khuller
    Ramakrishna Thurimella
    SODA(2000), pp. 223-232
    A Uniform Framework for Approximating Weighted Connectivity Problems
    Samir Khuller
    Balaji Raghavachari
    SODA(1999), pp. 937-938