Jump to Content
Gautam Prasad

Gautam Prasad

Gautam Prasad is a researcher on the Mobile Vision team at Google. His work revolves around understanding images, videos, audio, and motion sensor data by building datasets and using a variety of machine learning tools to better model and understand affect, gaze, and intent. Prior to his work at Google he was an Assistant Professor in the Neurology Department at the University of Southern California (USC). His research at USC focused on using machine learning techniques to mine the patterns of human brain connectivity in magnetic resonance imaging (MRI) images.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Socio-spatial equity analysis of relative wealth index and emergency obstetric care accessibility in urban Nigeria
    Kerry L. M. Wong
    Aduragbemi Banke-Thomas
    Tope Olubodun
    Peter M. Macharia
    Charlotte Stanton
    Narayanan Sundararajan
    Yash Shah
    Mansi Kansal
    Swapnil Vispute
    Olakunmi Ogunyemi
    Uchenna Gwacham-Anisiobi
    Jia Wang
    Ibukun-Oluwa Omolade Abejirinde
    Prestige Tatenda Makanga
    Bosede B. Afolabi
    Lenka Beňová
    Communications Medicine, vol. 4 (2024), pp. 34
    Preview abstract Background Better geographical accessibility to comprehensive emergency obstetric care (CEmOC) facilities can significantly improve pregnancy outcomes. However, with other factors, such as affordability critical for care access, it is important to explore accessibility across groups. We assessed CEmOC geographical accessibility by wealth status in the 15 most-populated Nigerian cities. Methods We mapped city boundaries, verified and geocoded functional CEmOC facilities, and assembled population distribution for women of childbearing age and Meta’s Relative Wealth Index (RWI). We used the Google Maps Platform’s internal Directions Application Programming Interface to obtain driving times to public and private facilities. City-level median travel time (MTT) and number of CEmOC facilities reachable within 60 min were summarised for peak and non-peak hours per wealth quintile. The correlation between RWI and MTT to the nearest public CEmOC was calculated. Results We show that MTT to the nearest public CEmOC facility is lowest in the wealthiest 20% in all cities, with the largest difference in MTT between the wealthiest 20% and least wealthy 20% seen in Onitsha (26 vs 81 min) and the smallest in Warri (20 vs 30 min). Similarly, the average number of public CEmOC facilities reachable within 60 min varies (11 among the wealthiest 20% and six among the least wealthy in Kano). In five cities, zero facilities are reachable under 60 min for the least wealthy 20%. Those who live in the suburbs particularly have poor accessibility to CEmOC facilities. Conclusions Our findings show that the least wealthy mostly have poor accessibility to care. Interventions addressing CEmOC geographical accessibility targeting poor people are needed to address inequities in urban settings. View details
    A geospatial database of close to reality travel times to obstetric emergency care in 15 Nigerian conurbations
    Peter M. Macharia
    Kerry L. M. Wong
    Tope Olubodun
    Lenka Beňová
    Charlotte Stanton
    Narayanan Sundararajan
    Yash Shah
    Mansi Kansal
    Swapnil Vispute
    Uchenna Gwacham-Anisiobi
    Olakunmi Ogunyemi
    Jia Wang
    Ibukun-Oluwa Omolade Abejirinde
    Prestige Tatenda Makanga
    Bosede B. Afolabi
    Aduragbemi Banke-Thomas
    Scientific Data, vol. TBD (2023), TBD
    Preview abstract Travel time estimation accounting for on-the-ground realities between the location where a need for emergency obstetric care (EmOC) arises and the health facility capable of providing such services is essential for improving maternal and neonatal health outcomes. Current understanding of travel time to care is particularly inadequate in urban areas where short distances obscure long travel times, and also in low-resource settings. Here, we describe a database of travel times to facilities that can provide comprehensive EmOC in the 15 most populated extended urban areas (conurbations) in Nigeria. The travel times from cells of approximately 0.6 x 0.6km to facilities were derived based on Google Maps Platform’s internal Directions Application Programming Interface (API). The API incorporates estimates of traffic to provide closer-to-reality estimates of travel time. Computations were done to the first, second and third nearest public or private facilities. Travel time estimates for eight traffic scenarios (including peak and non-peak periods) and number of facilities within specific time thresholds were estimated. The database offers a plethora of opportunities for research and planning towards improving EmOC accessibility. View details
    Preview abstract Videos can evoke a range of affective responses in viewers. The ability to predict evoked affect from a video, before viewers watch the video, can help in content creation and video recommendation. We introduce the Evoked Expressions from Videos (EEV) dataset, a large-scale dataset for studying viewer responses to videos. Each video is annotated at 6 Hz with 15 continuous evoked expression labels, corresponding to the facial expression of viewers who reacted to the video. We use an expression recognition model within our data collection framework to achieve scalability. In total, there are 36.7 million annotations of viewer facial reactions to 23,574 videos (1,700 hours). We use a publicly available video corpus to obtain a diverse set of video content. We establish baseline performance on the EEV dataset using an existing multimodal recurrent model. Transfer learning experiments show an improvement in performance on the LIRIS-ACCEDE video dataset when pre-trained on EEV. We hope that the size and diversity of the EEV dataset will encourage further explorations in video understanding and affective computing. A subset of EEV is released at https://github.com/google-research-datasets/eev. View details
    Preview abstract Understanding the degree to which human facial expressions co-vary with specific social contexts across cultures is central to the theory that emotions enable adaptive responses to important challenges and opportunities. Concrete evidence linking social context to specific facial expressions is sparse and is largely based on survey-based approaches, which are often constrained by language and small sample sizes. Here, by applying machine-learning methods to real-world, dynamic behaviour, we ascertain whether naturalistic social contexts (for example, weddings or sporting competitions) are associated with specific facial expressions across different cultures. In two experiments using deep neural networks, we examined the extent to which 16 types of facial expression occurred systematically in thousands of contexts in 6 million videos from 144 countries. We found that each kind of facial expression had distinct associations with a set of contexts that were 70% preserved across 12 world regions. Consistent with these associations, regions varied in how frequently different facial expressions were produced as a function of which contexts were most salient. Our results reveal fine-grained patterns in human facial expressions that are preserved across the modern world. View details
    Weakly Supervised Action Localization by Sparse Temporal Pooling Network
    Phuc Nguyen
    Bohyung Han
    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 6752-6761
    Preview abstract We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks. Our algorithm learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations. We design our network to identify a sparse subset of key segments associated with target actions in a video using an attention module and fuse the key segments through adaptive temporal pooling. Our loss function is comprised of two terms that minimize the video-level action classification error and enforce the sparsity of the segment selection. At inference time, we extract and score temporal proposals using temporal class activations and class-agnostic attentions to estimate the time intervals that correspond to target actions. The proposed algorithm attains state-of-the-art results on the THUMOS14 dataset and outstanding performance on ActivityNet1.3 even with its weak supervision. View details
    GLA in MediaEval 2018 Emotional Impact of Movies Task
    Jennifer Jianing Sun
    MediaEval 2018 Multimedia Benchmark Workshop
    Preview abstract The visual and audio information from movies can evoke a variety of emotions in viewers. Towards a better understanding of viewer impact, we present our methods for the MediaEval 2018 Emotional Impact of Movies Task to predict the expected valence and arousal continuously in movies. This task, using the LIRIS-ACCEDE dataset, enables researchers to compare different approaches for predicting viewer impact from movies. Our approach leverages image, audio, and face based features computed using pre-trained neural networks. These features were computed over time and modeled using a gated recurrent unit (GRU) based network followed by a mixture of experts model to compute multiclass predictions. We smoothed these predictions using a Butterworth filter for our final result. Our method enabled us to achieve top performance in three evaluation metrics in the MediaEval 2018 task. View details
    No Results Found