Arbaaz Muslim
Arbaaz is a researcher on the Health AI team at Google, where he is currently working on a foundation model that captures population dynamics. He earned his MS in Electrical Engineering and Computer Science from UC Berkeley in 2023, with a thesis focusing on protein language modeling. In industry, he has worked at various health-tech startups, designing and implementing data infrastructure to incorporate patient data into machine learning models. In academia, he contributed to research projects on neural tracking and neural receptive field prediction. Arbaaz's interests lie in leveraging machine learning and data engineering to improve healthcare outcomes.
Authored Publications
Sort By
Community search signatures as foundation features for human-centered geospatial modeling
Chaitanya Kamath
Mohit Agarwal
David Schottlander
Shailesh Bavadekar
Niv Efron
Shravya Shetty
ICML 2024 Workshop on Data-Centric Machine Learning Research
Preview abstract
Aggregated relative search frequencies offer a unique composite signal reflecting people's habits, concerns, interests, intents, and general information needs, which are not found in other readily available datasets. Temporal search trends have been successfully used to perform nowcasting across a variety of domains such as infectious diseases, unemployment rates, and retail sales. However, most existing applications require curating specialized datasets of individual keywords, queries, or query clusters, and the search data need to be temporally aligned with the outcome variable of interest. We propose a novel approach for generating an aggregated and anonymized representation of search interest as foundation features at the community level for geospatial modeling. We benchmark these features using spatial datasets across multiple domains. In regions with a population greater than 3000 that cover over 95% of the contiguous US population, our models achieve an average R-squared score of 0.74 across 21 health variables, and 0.80 across 6 demographic and environmental variables. Our results demonstrate that these search features can be used for spatial predictions without strict temporal alignment, and that the resulting models outperform spatial interpolation and state of the art methods using satellite imagery features.
View details
General Geospatial Inference with a Population Dynamics Foundation Model
Chaitanya Kamath
Shravya Shetty
David Schottlander
Yael Mayer
Joydeep Paul
Jamie McPike
Sheila de Guia
Niv Efron
(2024) (to appear)
Preview abstract
Understanding complex relationships between human behavior and local contexts is crucial for various applications in public health, social science, and environmental studies. Traditional approaches often make use of small sets of manually curated, domain-specific variables to represent human behavior, and struggle to capture these intricate connections, particularly when dealing with diverse data types. To address this challenge, this work introduces a novel approach that leverages the power of graph neural networks (GNNs). We first construct a large dataset encompassing human-centered variables aggregated at postal code and county levels across the United States. This dataset captures rich information on human behavior (internet search behavior and mobility patterns) along with environmental factors (local facility availability, temperature, and air quality). Next, we propose a GNN-based framework designed to encode the connections between these diverse features alongside the inherent spatial relationships between postal codes and their containing counties. We then demonstrate the effectiveness of our approach by benchmarking the model on 27 target variables spanning three distinct domains: health, socioeconomic factors, and environmental measurements. Through spatial interpolation, extrapolation, and super-resolution tasks, we show that the proposed method can effectively utilize the rich feature set to achieve accurate predictions across diverse geospatial domains.
View details