Hector Yee

Hector Yee

Hector Yee is a research engineer at Google since January 2007. He earned his MS in computer graphics from Cornell University in 2000. He spent a few years in the computer games and feature animation industry working on hit movies such as Shrek before moving on to do machine learning at Google. Hector Yee's interests are in statistical machine learning and its applications, particularly to text, video, images and more recently recommendation systems. http://www.linkedin.com/profile/view?id=1937667&trk=tab_pro
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Community search signatures as foundation features for human-centered geospatial modeling
    Chaitanya Kamath
    Mohit Agarwal
    David Schottlander
    Shailesh Bavadekar
    Niv Efron
    Shravya Shetty
    ICML 2024 Workshop on Data-Centric Machine Learning Research
    Preview abstract Aggregated relative search frequencies offer a unique composite signal reflecting people's habits, concerns, interests, intents, and general information needs, which are not found in other readily available datasets. Temporal search trends have been successfully used to perform nowcasting across a variety of domains such as infectious diseases, unemployment rates, and retail sales. However, most existing applications require curating specialized datasets of individual keywords, queries, or query clusters, and the search data need to be temporally aligned with the outcome variable of interest. We propose a novel approach for generating an aggregated and anonymized representation of search interest as foundation features at the community level for geospatial modeling. We benchmark these features using spatial datasets across multiple domains. In regions with a population greater than 3000 that cover over 95% of the contiguous US population, our models achieve an average R-squared score of 0.74 across 21 health variables, and 0.80 across 6 demographic and environmental variables. Our results demonstrate that these search features can be used for spatial predictions without strict temporal alignment, and that the resulting models outperform spatial interpolation and state of the art methods using satellite imagery features. View details
    General Geospatial Inference with a Population Dynamics Foundation Model
    Chaitanya Kamath
    Prithul Sarker
    Joydeep Paul
    Yael Mayer
    Sheila de Guia
    Jamie McPike
    Adam Boulanger
    David Schottlander
    Yao Xiao
    Manjit Chakravarthy Manukonda
    Monica Bharel
    Von Nguyen
    Luke Barrington
    Niv Efron
    Krish Eswaran
    Shravya Shetty
    (2024) (to appear)
    Preview abstract Supporting the health and well-being of dynamic populations around the world requires governmental agencies, organizations, and researchers to understand and reason over complex relationships between human behavior and local contexts. This support includes identifying populations at elevated risk and gauging where to target limited aid resources. Traditional approaches to these classes of problems often entail developing manually curated, task-specific features and models to represent human behavior and the natural and built environment, which can be challenging to adapt to new, or even related tasks. To address this, we introduce the Population Dynamics Foundation Model (PDFM), which aims to capture the relationships between diverse data modalities and is applicable to a broad range of geospatial tasks. We first construct a geo-indexed dataset for postal codes and counties across the United States, capturing rich aggregated information on human behavior from maps, busyness, and aggregated search trends, and environmental factors such as weather and air quality. We then model this data and the complex relationships between locations using a graph neural network, producing embeddings that can be adapted to a wide range of downstream tasks using relatively simple models. We evaluate the effectiveness of our approach by benchmarking it on 27 downstream tasks spanning three distinct domains: health indicators, socioeconomic factors, and environmental measurements. The approach achieves state-of-the-art performance on geospatial interpolation across all tasks, surpassing existing satellite and geotagged image based location encoders. In addition, it achieves state-of-the-art performance in extrapolation and super-resolution for 25 of the 27 tasks. We also show that the PDFM can be combined with a state-of-the-art forecasting foundation model, TimesFM, to predict unemployment and poverty, achieving performance that surpasses fully supervised forecasting. The full set of embeddings and sample code are publicly available for researchers. In conclusion, we have demonstrated a general purpose approach to geospatial modeling tasks critical to understanding population dynamics by leveraging a rich set of complementary globally available datasets that can be readily adapted to previously unseen machine learning tasks. View details
    Scalable and accurate deep learning for electronic health records
    Alvin Rishi Rajkomar
    Eyal Oren
    Nissan Hajaj
    Mila Hardt
    Peter J. Liu
    Xiaobing Liu
    Jake Marcus
    Patrik Per Sundberg
    Kun Zhang
    Yi Zhang
    Gerardo Flores
    Gavin Duggan
    Jamie Irvine
    Kurt Litsch
    Alex Mossin
    Justin Jesada Tansuwan
    De Wang
    Dana Ludwig
    Samuel Volchenboum
    Kat Chou
    Michael Pearson
    Srinivasan Madabushi
    Nigam Shah
    Atul Butte
    npj Digital Medicine (2018)
    Preview abstract Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient’s record. We propose a representation of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two U.S. academic medical centers with 216,221 adult patients hospitalized for at least 24 hours. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting: in-hospital mortality (AUROC across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient’s final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed state-of-the-art traditional predictive models in all cases. We also present a case-study of a neural-network attribution system, which illustrates how clinicians can gain some transparency into the predictions. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios, complete with explanations that directly highlight evidence in the patient’s chart. View details
    Affinity Weighted Embedding
    Jason Weston
    International Conference on Machine Learning (2014)
    Preview abstract Supervised linear embedding models like Wsabie (Weston et al., 2011) and supervised semantic indexing (Bai et al., 2010) have proven successful at ranking, recommendation and annotation tasks. However, despite being scalable to large datasets they do not take full advantage of the extra data due to their linear nature, and we believe they typically underfit. We propose a new class of models which aim to provide improved performance while retaining many of the benefits of the existing class of embedding models. Our approach works by reweighting each component of the embedding of features and labels with a potentially nonlinear affinity function. We describe several variants of the family, and show its usefulness on several datasets. View details
    Label Partitioning for Sublinear Ranking
    Jason Weston
    International Conference on Machine Learning (2013)
    Preview
    Nonlinear Latent Factorization by Embedding Multiple User Interests
    Jason Weston
    ACM International Conference on Recommender Systems (RecSys) (2013)
    Preview abstract Classical matrix factorization approaches to collaborative filtering learn a latent vector for each user and each item, and recommendations are scored via the similarity between two such vectors, which are of the same dimension. In this work, we are motivated by the intuition that a user is a much more complicated entity than any single item, and cannot be well described by the same representation. Hence, the variety of a user’s interests could be better captured by a more complex representation. We propose to model the user with a richer set of functions, specifically via a set of latent vectors, where each vector captures one of the user’s latent interests or tastes. The overall recommendation model is then nonlinear where the matching score between a user and a given item is the maximum matching score over each of the user’s latent interests with respect to the item’s latent representation. We describe a simple, general and efficient algorithm for learning such a model, and apply it to large scale, real world datasets from YouTube and Google Music, where our approach outperforms existing techniques. View details
    Affinity Weighted Embedding
    Jason Weston
    International Conference on Learning Representations (2013)
    Preview
    Learning to Rank Recommendations with the k-Order Statistic Loss
    Jason Weston
    ACM International Conference on Recommender Systems (RecSys) (2013)
    Preview abstract Making recommendations by learning to rank is becoming an increasingly studied area. Approaches that use stochastic gradient descent scale well to large collaborative filtering datasets, and it has been shown how to approximately optimize the mean rank, or more recently the top of the ranked list. In this work we present a family of loss functions, the korder statistic loss, that includes these previous approaches as special cases, and also derives new ones that we show to be useful. In particular, we present (i) a new variant that more accurately optimizes precision at k, and (ii) a novel procedure of optimizing the mean maximum rank, which we hypothesize is useful to more accurately cover all of the user’s tastes. The general approach works by sampling N positive items, ordering them by the score assigned by the model, and then weighting the example as a function of this ordered set. Our approach is studied in two real-world systems, Google Music and YouTube video recommendations, where we obtain improvements for computable metrics, and in the YouTube case, increased user click through and watch duration when deployed live on www.youtube.com. View details