Sara Ahmadian

Sara Ahmadian

Sara Ahmadian is a research scientist in the Geo Algorithms research team, which is part of the broader Athena org. Sara earned degrees in Combinatorics and Optimization (M.M. 2010, Ph.D. 2017) from University of Waterloo, where she was advised by Chaitanya Swamy and supported by a NSERC Fellowship. Sara is a recipient of 2017 University of Waterloo Outstanding Achievement in Graduate Studies (Ph.D.) designation for her PhD thesis. She worked as a software developer for a start-up company in Waterloo after completing her Master's and before starting her PhD. Prior to that, She completed Bachelor of Science in Computer Engineering in Sharif University of Technology, Tehran, Iran. Her research interests include efficient machine learning, combinatorial optimization, approximation algorithm, and design and analysis of algorithms.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Many geographic information systems applications rely on the data provided by user devices in the road network. Such applications include traffic monitoring, driving navigation, detecting road closures or the construction of new roads, etc. This signal is collected by sampling locations from the user trajectories and is a critical process for all such systems. Yet, it has not been sufficiently studied in the literature. The most natural way to sample a trajectory is perhaps using a frequency based algorithm, e.g., sample every $x$ seconds. However, as we argue in this paper, such a simple strategy can be very wasteful in terms of resources (e.g., server-side processing, user battery) and in terms of the amount of user data that it maintains. In this work we conduct a horizontal study of various location sampling algorithms (including frequency-based, road geography-based, reservoir-sampling based, etc.) and extract their trade-offs in terms of various metrics of interest, such as, the size of the stored data and the induced quality of training for prediction tasks (e.g., predicting speeds) using the road network of New York City. View details
    Preview abstract Online navigation platforms are well optimized to solve for the standard objective of minimizing the travel time and typically require precomputation-based architectures (such as Contraction Hierarchies and the Customizable Route Planning) to do so in a fast manner. The reason for this dependence is the size of the graph that represents the road network, which is large. The need to go beyond minimizing the travel time and introduce various types of customizations has led to approaches that rely on alternative route computation or, more generally, small subgraph extraction. On a small subgraph, one is able to run computationally expensive algorithms at query time and compute optimal solutions for multiple routing problems. In this framework, it is critical for the subgraph to a) be small and b) include (near) optimal routes for a collection of customizations. This is precisely the setting that we study in this work. We design algorithms that extract a subgraph connecting designated terminals with the objective to minimize the subgraph's size and the constraint to include near optimal routes for a set of predefined cost functions. We provide theoretical guarantees for our algorithms and evaluate them empirically using real world road networks. View details
    KwikBucks: Correlation Clustering with Cheap-Weak and Expensive-Strong Signals
    Sandeep Silwal
    Andrew Nystrom
    Andrew McCallum
    International Conference in Learning Representation (ICLR)(2023) (to appear)
    Preview abstract The unprecedented rate at which the sizes of machine learning (ML) models are growing necessitates novel approaches to enable efficient and scalable solutions. We contribute to this line of work by studying a novel version of the Budgeted Correlation Clustering problem where along with a limited number of queries to an expensive oracle for node similarities (e.g. a large ML model), we have unlimited access to a cheaper but less accurate second oracle. Our formulation is inspired by many practical scenarios where coarse approximations of the expensive similarity metric can be efficiently obtained via weaker models. We develop a theoretically motivated algorithm in this setting that leverages the cheap oracle to judiciously query the strong oracle while maintaining high clustering quality. We empirically demonstrate gains in query minimization and clustering metrics on a variety of datasets with diverse strong and cheap oracles. Most notably, we demonstrate a practical application in text clustering based on expensive cross-attention language models by showing that cheaper (but weaker) embedding-based models can be leveraged to substantially reduce the number of inference calls to the former. View details
    Preview abstract Inspired by applications on search engines and web servers, we consider a load balancing problem with a general \textit{convex} objective function. In this problem, we are given a bipartite graph on a set of sources $S$ and a set of workers $W$ and the goal is to distribute the load from each source among its neighboring workers such that the total load of workers are as balanced as possible. We present a new distributed algorithm that works with \textit{any} symmetric non-decreasing convex function for evaluating the balancedness of the workers' load. Our algorithm computes a nearly optimal allocation of loads in $O(\log n \log^2 d/\eps^3)$ rounds where $n$ is the number of nodes, $d$ is the maximum degree, and $\eps$ is the desired precision. If the objective is to minimize the maximum load, we modify the algorithm to obtain a nearly optimal solution in $O(\log n \log d/\eps^2)$ rounds. This improves a line of algorithms that require a polynomial number of rounds in $n$ and $d$ and appear to encounter a fundamental barrier that prevents them from obtaining poly-logarithmic runtime \cite{berenbrink2005dynamic, berenbrink2009new, subramanian1994analysis, rabani1998local}. In our paper, we introduce a novel primal-dual approach with multiplicative weight updates that allows us to circumvent this barrier. Our algorithm is inspired by \cite{agrawal2018proportional} and other distributed algorithms for optimizing linear objectives but introduces several new twists to deal with general convex objectives. View details
    Fair Correlation clustering
    23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020)(2020) (to appear)
    Preview abstract In this paper, we study correlation clustering under fairness constraints. Fair variants of k-median and k-center clustering have been studied recently, and approximation algorithms using a notion called fairlet decomposition have been proposed. We obtain approximation algorithms for fair correlation clustering under several important types of fairness constraints. Our results hinge on obtaining a fairlet decomposition for correlation clustering by introducing a novel combinatorial optimization problem. We define a fairlet decomposition with cost similar to the k-median cost and this allows us to obtain approximation algorithms for a wide range of fairness constraints. We complement our theoretical results with an in-depth analysis of our algorithms on real graphs where we show that fair solutions to correlation clustering can be obtained with limited increase in cost compared to the state-of-the-art (unfair) algorithms. View details
    Preview abstract Hierarchical Clustering is an unsupervised data analysis method which has been widely used for decades. Despite its popularity, it had an underdeveloped analytical foundation and to address this, Dasgupta recently introduced an optimization view-point of hierarchical clustering with pair- wise similarity information that spurred a line of work shedding light on old algorithms (e.g., Average-Linkage), but also designing new algorithms. Here, for the maximization dual of Dasgupta’s objec- tive (introduced by Moseley-Wang), we present polynomial-time 42.46% approximation algorithms that use Max-Uncut Bisection as a subroutine. The previous best worst-case approximation factor in polynomial time was 33.6%, improving only slightly over Average-Linkage which achieves 33.3%. Finally, we complement our positive results by providing APX-hardness (even for 0-1 similarities), under the Small Set Expansion hypothesis. View details
    Preview abstract As machine learning has become more and more integrated into our businesses and lifestyles, researchers have begun to recognize the necessity of ensuring machine learning systems are fair. Recently, there has been an interest in defining a notion of fairness that mitigates over-representation in traditional clustering. In this paper we extend this notion to hierarchical clustering, where the goal is to recursively partition the data to optimize a certain objective~\cite{dasgupta}. For various natural objectives, we obtain simple, efficient algorithms to find a provably good fair hierarchical clustering. Empirically, we show that our algorithms can find a fair hierarchical clustering, surprisingly, with only a negligible loss in the objective. View details
    Preview abstract Clustering is a fundamental problem in unsupervised machine learning. In many applications, clustering needs to be performed in presence of additional constraints, such as fairness or diversity constraints. In this paper, we formulate the problem of k-center clustering without over-representation, and propose approximation algorithms to solve the problem, as well as hardness results. We empirically evaluate our clusterings on real-world dataset and show that fairness can be obtained with limited effect on clustering quality. View details