Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10595 publications
Validation of a Deep Learning Model for Diabetic Retinopathy on Patients with Young-Onset Diabetes
Tony Tan-Torres
Pradeep Praveen
Divleen Jeji
Arthur Brant
Xiang Yin
Lu Yang
Tayyeba Ali
Ilana Traynis
Dushyantsinh Jadeja
Rajroshan Sawhney
Sunny Virmani
Pradeep Venkatesh
Nikhil Tandon
Ophthalmology and Therapy (2025)
Preview abstract
Introduction
While many deep learning systems (DLSs) for diabetic retinopathy (DR) have been developed and validated on cohorts with an average age of 50s or older, fewer studies have examined younger individuals. This study aimed to understand DLS performance for younger individuals, who tend to display anatomic differences, such as prominent retinal sheen. This sheen can be mistaken for exudates or cotton wool spots, and potentially confound DLSs.
Methods
This was a prospective cross-sectional cohort study in a “Diabetes of young” clinic in India, enrolling 321 individuals between ages 18 and 45 (98.8% with type 1 diabetes). Participants had fundus photographs taken and the photos were adjudicated by experienced graders to obtain reference DR grades. We defined a younger cohort (age 18–25) and an older cohort (age 26–45) and examined differences in DLS performance between the two cohorts. The main outcome measures were sensitivity and specificity for DR.
Results
Eye-level sensitivity for moderate-or-worse DR was 97.6% [95% confidence interval (CI) 91.2, 98.2] for the younger cohort and 94.0% [88.8, 98.1] for the older cohort (p = 0.418 for difference). The specificity for moderate-or-worse DR significantly differed between the younger and older cohorts, 97.9% [95.9, 99.3] and 92.1% [87.6, 96.0], respectively (p = 0.008). Similar trends were observed for diabetic macular edema (DME); sensitivity was 79.0% [57.9, 93.6] for the younger cohort and 77.5% [60.8, 90.6] for the older cohort (p = 0.893), whereas specificity was 97.0% [94.5, 99.0] and 92.0% [88.2, 95.5] (p = 0.018). Retinal sheen presence (94% of images) was associated with DME presence (p < 0.0001). Image review suggested that sheen presence confounded reference DME status, increasing noise in the labels and depressing measured sensitivity. The gradability rate for both DR and DME was near-perfect (99% for both).
Conclusion
DLS-based DR screening performed well in younger individuals aged 18–25, with comparable sensitivity and higher specificity compared to individuals aged 26–45. Sheen presence in this cohort made identification of DME difficult for graders and depressed measured DLS sensitivity; additional studies incorporating optical coherence tomography may improve accuracy of measuring DLS DME sensitivity.
View details
XR Blocks: Accelerating Human-Centered AI + XR Innovation
Nels Numan
Evgenii Alekseev
Alex Cooper
Min Xia
Scott Chung
Jeremy Nelson
Xiuxiu Yuan
Jolica Dias
Tim Bettridge
Benjamin Hersh
Michelle Huynh
Konrad Piascik
Ricardo Cabello
Google, XR, XR Labs (2025)
Preview abstract
We are on the cusp where Artificial Intelligence (AI) and Extended Reality (XR) are converging to unlock new paradigms of interactive computing. However, a significant gap exists between the ecosystems of these two fields: while AI research and development is accelerated by mature frameworks like PyTorch and benchmarks like LMArena, prototyping novel AI-driven XR interactions remains a high-friction process, often requiring practitioners to manually integrate disparate, low-level systems for perception, rendering, and interaction. To bridge this gap, we present XR Blocks, a cross-platform framework designed to accelerate human-centered AI + XR innovation. XR Blocks provides a modular architecture with plug-and-play components for core abstraction in AI + XR: user, world, peers; interface, context, and agents. Crucially, it is designed with the mission of "minimum code from idea to reality", accelerating rapid prototyping of complex AI + XR apps. Built upon accessible technologies (WebXR, three.js, TensorFlow, Gemini), our toolkit lowers the barrier to entry for XR creators. We demonstrate its utility through a set of open-source templates, samples, and advanced demos, empowering the community to quickly move from concept to interactive prototype.
View details
Origin-destination travel demand estimation: an approach that scales worldwide, and its application to five metropolitan highway networks
Christopher Bian
Yechen Li
Willa Ng
Bin Yan
Janny Zhang
Transportation Research Part B: Methodological (2025) (to appear)
Preview abstract
Estimating Origin-Destination (OD) travel demand is vital for effective urban planning
and traffic management. Developing universally applicable OD estimation
methodologies is significantly challenged by the pervasive scarcity of high-fidelity traffic
data and the difficulty in obtaining city-specific prior OD estimates (or seed ODs), which
are often prerequisite for traditional approaches. Our proposed method directly
estimates OD travel demand by systematically leveraging aggregated, anonymized
statistics from Google Maps Traffic Trends, obviating the need for conventional census
or city-provided OD data. The OD demand is estimated by formulating a single-level,
one-dimensional, continuous nonlinear optimization problem with nonlinear equality
and bound constraints to replicate highway path travel times. The method achieves
efficiency and scalability by employing a differentiable analytical macroscopic network
model. This model by design is computationally lightweight, distinguished by its
parsimonious parameterization that requires minimal calibration effort and its capacity
for instantaneous evaluation. These attributes ensure the method's broad applicability
and practical utility across diverse cities globally. Using segment sensor counts from
Los Angeles and San Diego highway networks, we validate our proposed approach,
demonstrating a two-thirds to three-quarters improvement in the fit to segment count
data over a baseline. Beyond validation, we establish the method's scalability and
robust performance in replicating path travel times across diverse highway networks,
including Seattle, Orlando, Denver, Philadelphia, and Boston. In these expanded
evaluations, our method not only aligns with simulation-based benchmarks but also
achieves an average 13% improvement in it's ability to fit travel time data compared to
the baseline during afternoon peak hours.
View details
Preview abstract
A high level talk about quantum computing at Google. I am giving an invited talk at the Kavli Frontiers of Science. *Please note that I am only using slides that have already been presented publicly by others on the team. All slides have already previously passed review.*
View details
Advancing seasonal prediction of tropical cyclone activity with a hybrid AI-physics climate model
Gan Zhang
Megha Rao
Janni Yuval
Ming Zhao
Environmental Research Letters (2025)
Preview abstract
Machine learning (ML) models are successful with weather forecasting and have shown progress in climate simulations, yet leveraging them for useful climate predictions needs exploration. Here we show this feasibility using neural general circulation model (NeuralGCM), a hybrid ML-physics atmospheric model developed by Google, for seasonal predictions of large-scale atmospheric variability and Northern Hemisphere tropical cyclone (TC) activity. Inspired by physical model studies, we simplify boundary conditions, assuming sea surface temperature and sea ice follow their climatological cycle but persist anomalies present at the initialization time. With such forcings, NeuralGCM can generate 100 simulation days in ∼8 min with a single graphics processing unit while simulating realistic atmospheric circulation and TC climatology patterns. This configuration yields useful seasonal predictions (July–November) for the tropical atmosphere and various TC activity metrics. Notably, the predicted and observed TC frequency in the North Atlantic and East Pacific basins are significantly correlated during 1990–2023 (r = ∼0.7), suggesting prediction skill comparable to existing physical GCMs. Despite challenges associated with model resolution and simplified boundary forcings, the model-predicted interannual variations demonstrate significant correlations with the observed sub-basin TC tracks (p < 0.1) and basin-wide accumulated cyclone energy (ACE) (p < 0.01) of the North Atlantic and North Pacific basins. These findings highlight the promise of leveraging ML models with physical insights to model TC risks and deliver seamless weather-climate predictions.
View details
Preview abstract
We investigate Learning from Label Proportions (LLP), a partial information setting where examples in a training set are grouped into bags, and only aggregate label values in each bag are available. Despite the partial observability, the goal is still to achieve small regret at the level of individual examples. We give results on the sample complexity of LLP under square loss, showing that our sample complexity is essentially optimal. From an algorithmic viewpoint, we rely on carefully designed variants of Empirical Risk Minimization, and Stochastic Gradient Descent algorithms, combined with ad hoc variance reduction techniques. On one hand, our theoretical results improve in important ways on the existing literature on LLP, specifically in the way the sample complexity depends on the bag size. On the other hand, we validate our algorithmic solutions on several datasets, demonstrating improved empirical performance (better accuracy for less samples) against recent baselines.
View details
Bridging Sign and Spoken Languages: Pseudo GlossGeneration for Sign Language Translation
Trevor Cohn
Jianyuan Guo
Advances in Neural Information Processing Systems (NeurIPS) (2025)
Preview abstract
Sign Language Translation (SLT) aims to map sign language videos to spoken language text. A common approach leverages gloss annotations as an intermediate representation, decomposing SLT into two sub-tasks: video-to-gloss recognition and gloss-to-text translation. While effective, this paradigm relies on expert-annotated gloss labels, which are costly and increasingly unavailable in many datasets, limiting scalability.
To address this challenge, we propose a gloss-free pseudo gloss generation framework that eliminates the need for human-annotated glosses while preserving the structured intermediate representation. Specifically, we prompt a Large Language Model (LLM) with example text-gloss pairs to extract potential sign-related gloss words from the text by leveraging its in-context learning capability.
To mitigate the inherent misalignment between generated pseudo glosses and sign sequences in the video, we further refine their order by formulating the alignment as a weakly supervised learning problem.
With the reordered pseudo-glosses, additional alignment losses such as CTC can be incorporated to enhance supervision. We train our SLT model—comprising a vision encoder and a translator—under a three-stage pipeline, effectively bridging the gap between sign and spoken language.
Despite its simplicity, our approach outperforms previous state-of-the-art gloss-free frameworks across three SLT benchmarks and achieves competitive results with gloss-based methods.
View details
Preview abstract
Invisible labor is work that is either not fully visible or not appropriately compensated. In open source software (OSS) ecosystems, essential tasks that do not involve code (like content moderation) often become invisible to the detriment of individuals and organizations. However, invisible labor is sufficiently difficult to measure that we do not know how much of OSS activities are invisible. Our study addresses this challenge, demonstrating that roughly half of OSS work is invisible. We do this by developing a cognitive anchoring survey technique that measures OSS developer self-assessments of labor visibility and attribution. Survey respondents (n=142) reported that their work is more likely to be invisible (2 in 3 tasks) than visible, and that half (50.1%) is uncompensated. Priming participants with the idea of visibility caused participants to think their work was more visible, and that visibility was less important, than those primed with invisibility. We also found evidence that tensions between attribution motivations probably increase how common invisible labor is. This suggests that advertising OSS activities as "open" may lead contributors to overestimate how visible their labor actually is. Our findings suggest benefits to working with varied stakeholders to make select, collectively valued activities visible, and increasing compensation in valued forms (like attribution, opportunities, or pay) when possible. This could improve fairness in software development while providing greater transparency into work designs that help organizations and communities achieve their goals.
View details
Society-Centric Product Innovation in the Era of Customer Obsession
International Journal of Science and Research Archive (IJSRA), Volume 14 - Issue 1 (2025)
Preview abstract
This article provides a comprehensive analysis of the evolving landscape of innovation in the technology sector, with a focus on the intersection of technological progress and social responsibility. The article explores key challenges facing the industry, including public trust erosion, digital privacy concerns, and the impact of automation on workforce dynamics. It investigates responsible innovation frameworks' emergence and implementation across various organizations, highlighting the transformation from traditional development approaches to more society-centric models. The article demonstrates how companies balance innovation speed with social responsibility, incorporate ethical considerations into their development processes, and address digital disparities across different demographics. By examining how companies balance the pace of innovation with ethical responsibilities, integrate social considerations into their processes, and address digital inequities across diverse demographics, the article underscores the transformative potential of these frameworks. Through insights into cross-functional teams, impact assessment tools, and stakeholder engagement strategies, it demonstrates how responsible innovation drives both sustainable business value and societal progress.
View details
Preview abstract
The global adoption of Large Language Models (LLMs) in healthcare shows promise for enhancing clinical workflows and improving patient outcomes. However, Automatic Speech
Recognition (ASR) errors in critical medical entities remain a significant challenge. These
errors can lead to severe consequences if undetected. This study investigates the prevalence and impact of ASR errors in medical transcription across Africa, Europe, and North America. By examining variations in accented English across three continents, we analyze the impact of regional speech patterns on ASR performance. Our research quantifies both the potential and limitations of LLMs in mitigating ASR inaccuracies within various medical settings, with particular attention to performance variations across regional accents and medical terminology. Our findings highlight significant disparities in ASR accuracy across regions and identify specific conditions under which LLM corrections prove most effective.
View details
Preview abstract
In this paper, we introduce a novel estimator for vision-aided inertial navigation systems (VINS), the Preconditioned Cholesky-based Square Root Information Filter (PC-SRIF). When working with linear systems, employing Cholesky decomposition offers superior efficiency but can compromise numerical stability. Due to this, existing VINS literature on (Square Root) Information Filters often opts for QR decomposition on platforms where single precision is preferred, avoiding the numerical challenges associated with Cholesky decomposition. While these issues are often attributed to the ill-conditioned information matrix in VINS, our analysis reveals that this is not an inherent property of VINS but rather a consequence of specific parametrizations. We identify several factors that contribute to an ill-conditioned information matrix and propose a preconditioning technique to mitigate these conditioning issues. Building on this analysis, we present PC-SRIF, which exhibits remarkable stability in performing Cholesky decomposition using single precision when solving linear systems in VINS. Consequently, PC-SRIF achieves superior theoretical efficiency compared to alternative estimators. To validate the efficiency advantages and numerical stability of PC-SRIF based VINS, we have conducted carefully controlled experiments, which provide empirical evidence in support of our theoretical findings. Empirically, in our VINS, PC-SRIF achieves more than 2x better efficiency than SRIF.
View details
A Call to Action: Advancing the Conversation Around Neurodivergent Education-Employment Transitions
Dannie Lynn Fountain
Vicki Baker
Kevin Danley
Closing the Gap (2025)
Preview abstract
Neurodiversity is still largely stigmatized and excluded from DEIB frameworks and related organizational initiatives, despite the increased recognition regarding the benefits of neuroinclusion within the education and corporate spheres. We seek to address this knowledge-to-practice gap through the creation of the Neurodiversity Engagement Framework. By highlighting supports needed for neurodivergent individuals, and those that support them, the framework helps neurodivergent individuals navigate within and across higher education and industry contexts. Informed by an interdisciplinary review of literature from higher education, industry, and corporate leadership contexts, the Neurodiversity Engagement Framework brings to light prevailing challenges within practices and policies, serving as a guide for the creation of a more supportive foundation for neurodiverse individuals to thrive. In this manuscript, readers are encouraged to consider the myriad of impacts that neurodiversity has on higher education and industry experiences and the ways that organizations can be more proactive in their support of this growing population. To conclude, we offer a roadmap for future research and practice to further elucidate ways academic and corporation leaders and policymakers can effectively support neurodivergent individuals.
View details
Preview abstract
We discuss the challenges posed by growing machine learning workloads on
datacenter networks and present how Google’s Jupiter network fabrics effectively support
diverse traffic.
View details
Toward Sensor-In-the-Loop LLM Agent: Benchmarks and Implications
Zhiwei Ren
Junbo Li
Minjia Zhang
Di Wang
Longfei Shangguan
SenSys 2025 - The 23rd ACM Conference on Embedded Networked Sensor Systems (2025)
Preview abstract
This paper advocates for sensor-informed personal agents that can take advantage of sensor hints on wearables to enhance the personal agent's response. We demonstrate that such a sensor-in-the-loop design paradigm can be easily integrated into existing LLM agents by building a prototype named WellMax based on existing well-developed techniques such as structured prompt tuning and few-shot prompting. The head-to-head comparison with a non-sensor-informed agent across five use scenarios demonstrates that this sensor-in-the-loop design can effectively improve users' needs and their overall experience. The deep-dive into agents' replies and participants' feedback further reveals that sensor-in-the-loop agents not only provide more contextually relevant responses but also exhibit a greater understanding of user priorities and situational nuances. We further conduct two case studies to examine the potential pitfalls and distill key insights from this sensor-in-the-loop agent. We believe this work sets the stage for more intelligent, empathetic, and effective interactions in future AI-driven personal assistants.
View details
Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models
Fei Wang
The Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) (2025) (to appear)
Preview abstract
Retrieval-Augmented Generation (RAG), while effective in integrating external knowledge to address the limitations of large language models (LLMs), can be undermined by imperfect retrieval, which may introduce irrelevant, misleading, or even malicious information. Despite its importance, previous studies have rarely explored the behavior of RAG through joint analysis on how errors from imperfect retrieval attribute and propagate, and how potential conflicts arise between the LLMs' internal knowledge and external sources. We find that imperfect retrieval augmentation might be inevitable and quite harmful, through controlled analysis under realistic conditions. We identify the knowledge conflicts between LLM-internal and external knowledge from retrieval as a bottleneck to overcome in the post-retrieval stage of RAG. To render LLMs resilient to imperfect retrieval, we propose Astute RAG, a novel RAG approach that adaptively elicits essential information from LLMs' internal knowledge, iteratively consolidates internal and external knowledge with source-awareness, and finalizes the answer according to information reliability. Our experiments using Gemini and Claude demonstrate that Astute RAG significantly outperforms previous robustness-enhanced RAG methods. Notably, Astute RAG is the only approach that matches or exceeds the performance of LLMs without RAG under worst-case scenarios. Further analysis reveals that Astute RAG effectively resolves knowledge conflicts, improving the reliability and trustworthiness of RAG systems.
View details