Sara Ahmadian
Sara Ahmadian is a research scientist in the Geo Algorithms research team, which is part of the broader Athena org. Sara earned degrees in Combinatorics and Optimization (M.M. 2010, Ph.D. 2017) from University of Waterloo, where she was advised by Chaitanya Swamy and supported by a NSERC Fellowship. Sara is a recipient of 2017 University of Waterloo Outstanding Achievement in Graduate Studies (Ph.D.) designation for her PhD thesis. She worked as a software developer for a start-up company in Waterloo after completing her Master's and before starting her PhD. Prior to that, She completed Bachelor of Science in Computer Engineering in Sharif University of Technology, Tehran, Iran. Her research interests include efficient machine learning, combinatorial optimization, approximation algorithm, and design and analysis of algorithms.
Authored Publications
Sort By
Efficient Location Sampling Algorithms for Road Networks
Vivek Kumar
Ameya Velingker
Santhoshini Velusamy
WebConf (2024)
Preview abstract
Many geographic information systems applications rely on the data provided by user devices in the road network. Such applications include traffic monitoring, driving navigation, detecting road closures or the construction of new roads, etc. This signal is collected by sampling locations from the user trajectories and is a critical process for all such systems. Yet, it has not been sufficiently studied in the literature. The most natural way to sample a trajectory is perhaps using a frequency based algorithm, e.g., sample every $x$ seconds. However, as we argue in this paper, such a simple strategy can be very wasteful in terms of resources (e.g., server-side processing, user battery) and in terms of the amount of user data that it maintains. In this work we conduct a horizontal study of various location sampling algorithms (including frequency-based, road geography-based, reservoir-sampling based, etc.) and extract their trade-offs in terms of various metrics of interest, such as, the size of the stored data and the induced quality of training for prediction tasks (e.g., predicting speeds) using the road network of New York City.
View details
KwikBucks: Correlation Clustering with Cheap-Weak and Expensive-Strong Signals
Sandeep Silwal
Andrew Nystrom
Andrew McCallum
International Conference in Learning Representation (ICLR) (2023) (to appear)
Preview abstract
The unprecedented rate at which the sizes of machine learning (ML) models are growing necessitates novel approaches to enable efficient and scalable solutions. We contribute to this line of work by studying a novel version of the Budgeted Correlation Clustering problem where along with a limited number of queries to an expensive oracle for node similarities (e.g. a large ML model), we have unlimited access to a cheaper but less accurate second oracle. Our formulation is inspired by many practical scenarios where coarse approximations of the expensive similarity metric can be efficiently obtained via weaker models. We develop a theoretically motivated algorithm in this setting that leverages the cheap oracle to judiciously query the strong oracle while maintaining high clustering quality. We empirically demonstrate gains in query minimization and clustering metrics on a variety of datasets with diverse strong and cheap oracles. Most notably, we demonstrate a practical application in text clustering based on expensive cross-attention language models by showing that cheaper (but weaker) embedding-based models can be leveraged to substantially reduce the number of inference calls to the former.
View details
Distributed load balancing: a new framework and improved guarantees
Allen Liu
Binghui Peng
Innovations in Theoretical Computer Science (2021)
Preview abstract
Inspired by applications on search engines and web servers, we consider a load balancing problem with a general \textit{convex} objective function. In this problem, we are given a bipartite graph on a set of sources $S$ and a set of workers $W$ and the goal is to distribute the load from each source among its neighboring workers such that the total load of workers are as balanced as possible.
We present a new distributed algorithm that works with \textit{any} symmetric non-decreasing convex function for evaluating the balancedness of the workers' load.
Our algorithm computes a nearly optimal allocation of loads in $O(\log n \log^2 d/\eps^3)$ rounds where $n$ is the number of nodes, $d$ is the maximum degree, and $\eps$ is the desired precision. If the objective is to minimize the maximum load, we modify the algorithm to obtain a nearly optimal solution in $O(\log n \log d/\eps^2)$ rounds. This improves a line of algorithms that require a polynomial number of rounds in $n$ and $d$ and appear to encounter a fundamental barrier that prevents them from obtaining poly-logarithmic runtime
\cite{berenbrink2005dynamic, berenbrink2009new, subramanian1994analysis, rabani1998local}. In our paper, we introduce a novel primal-dual approach with multiplicative weight updates that allows us to circumvent this barrier. Our algorithm is inspired by \cite{agrawal2018proportional} and other distributed algorithms for optimizing linear objectives but introduces several new twists to deal with general convex objectives.
View details
Fair Correlation clustering
23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020) (2020) (to appear)
Preview abstract
In this paper, we study correlation clustering under fairness constraints. Fair variants of k-median and k-center clustering have been studied recently, and approximation algorithms using a notion called fairlet decomposition have been proposed. We obtain approximation algorithms for fair correlation clustering under several important types of fairness constraints.
Our results hinge on obtaining a fairlet decomposition for correlation clustering by introducing a novel combinatorial optimization problem. We define a fairlet decomposition with cost similar to the k-median cost and this allows us to obtain approximation algorithms for a wide range of fairness constraints.
We complement our theoretical results with an in-depth analysis of our algorithms on real graphs where we show that fair solutions to correlation clustering can be obtained with limited increase in cost compared to the state-of-the-art (unfair) algorithms.
View details
Bisect and Conquer: Hierarchical Clustering via Max-Uncut Bisection
Vaggos Chatziafratis
AISTATS, AISTATS, AISTATS (2020), AISTATS (to appear)
Preview abstract
Hierarchical Clustering is an unsupervised data analysis method which has been widely used for decades. Despite its popularity, it had an underdeveloped analytical foundation and to address this, Dasgupta recently introduced an optimization view-point of hierarchical clustering with pair-
wise similarity information that spurred a line of work shedding light on old algorithms (e.g., Average-Linkage), but also designing new algorithms. Here, for the maximization dual of Dasgupta’s objec-
tive (introduced by Moseley-Wang), we present polynomial-time 42.46% approximation algorithms that use Max-Uncut Bisection as a subroutine. The previous best worst-case approximation factor in polynomial time was 33.6%, improving only slightly over Average-Linkage which achieves 33.3%. Finally, we complement our positive results by providing APX-hardness (even for 0-1 similarities), under the Small Set Expansion hypothesis.
View details
Fair Hierarchical Clustering
Benjamin Moseley
Marina Knittel
Yuyan Wang
Neurips 2020
Preview abstract
As machine learning has become more and more integrated into our businesses and lifestyles, researchers have begun to recognize the necessity of ensuring machine learning systems are fair. Recently, there has been an interest in defining a notion of fairness that mitigates over-representation in traditional clustering.
In this paper we extend this notion to hierarchical clustering, where the goal is to recursively partition the data to optimize a certain objective~\cite{dasgupta}. For various natural objectives, we obtain simple, efficient algorithms to find a provably good fair hierarchical clustering. Empirically, we show that our algorithms can find a fair hierarchical clustering, surprisingly, with only a negligible loss in the objective.
View details
Preview abstract
Clustering is a fundamental problem in unsupervised machine learning. In many applications, clustering needs to be performed in presence of additional constraints, such as fairness or diversity constraints.
In this paper, we formulate the problem of k-center clustering without over-representation, and propose approximation algorithms to solve the problem, as well as hardness results. We empirically evaluate our clusterings on real-world dataset and show that fairness can be obtained with limited effect on clustering quality.
View details