play silent looping video pause silent looping video

How Google’s AI can help transform health professions education

August 27, 2025

Mike Schaekermann, Research Lead, and Paul Jhun, Medical Education Lead, Google Research

We explore the utility of Google’s AI models as helpful tools in medical learning environments. By employing a learner-centered and evaluation-driven approach, we seek to reimagine the future of education for health professionals.

The global health workforce is facing a critical shortage, with projections indicating a deficit exceeding 11 million healthcare workers by 2030. At Google, we are researching how AI can transform education for health professions to help close this gap with studies exploring how Google’s AI models can serve as effective personalized learning tools in medical learning environments.

Today we present two such studies. First, in “Generative AI for medical education: Insights from a case study with medical students and an AI tutor for clinical reasoning”, published at CHI 2025, we took a qualitative approach to understanding and designing for medical learners through interdisciplinary co-design workshops, rapid prototyping, and user studies. Next, in our latest update of “LearnLM: Improving Gemini for Learning”, we quantitatively assessed LearnLM — our Gemini-based family of models fine-tuned for learning — on medical education scenarios through preference ratings from both medical students and physician educators. Both studies revealed a strong interest in AI tools that can adapt to learners and incorporate preceptor-like behaviors, such as providing constructive feedback and promoting critical thinking. Physician educators rated LearnLM as demonstrating better pedagogy and behaving “more like a very good human tutor” compared to base models. These novel capabilities are now available with Gemini 2.5 Pro.

Understanding the medical learner

Employing a learner-centered approach has been critical in guiding our development of responsible AI tools that scale individualized learner pathways and augment competency-based approaches. Central to this approach, we first conducted formative user experience (UX) research to understand medical learner needs. Through a participatory design process, we began with a co-design workshop that convened an interdisciplinary panel of medical students, clinicians, medical educators, UX designers, and AI researchers to define opportunities for incorporating AI in this space. Insights from this session guided the development of an AI tutor prototype, explicitly designed to guide learners through clinical reasoning anchored on a synthetic clinical vignette.

We then evaluated the AI tutor prototype’s helpfulness in a qualitative user study with eight participants (4 medical students and 4 residents). The study aimed to elicit participant learning needs and challenges as well as their attitudes toward AI assistance in education. Each participant engaged in a 1-hour session with a UX researcher involving semi-structured interviews and interactive sessions with the prototype. All sessions were remote and conducted through video conferencing software. Participants accessed the prototype through a web link and shared their screen while interacting with the prototype.

Our thematic analysis of medical learner interviews revealed various challenges to acquiring clinical reasoning skills and the potential for generative AI in addressing these challenges. For example, medical learners expressed a significant interest in AI tools capable of adapting to unique individual learning styles and knowledge gaps. Participants also highlighted the importance of preceptor-like behaviors, such as managing cognitive load, providing constructive feedback, and encouraging questions and reflection.

GenAI for Medical Education-final

Overview of the participatory research process aimed at understanding and building for medical learners through an interdisciplinary co-design workshop, rapid research prototyping, and qualitative user studies.

Meeting medical learners where they are

Building on these insights, we conducted a blinded feasibility study with medical students and physician educators to quantitatively assess LearnLM's pedagogical qualities in medical education settings compared with Gemini 1.5 Pro as the base model. In collaboration with experts, we designed a set of 50 synthetic evaluation scenarios across a range of medical education subjects, from pre-clinical topics, such as platelet activation, to clinical topics, like neonatal jaundice, reflecting the core competencies and standards in medical education.

We recruited medical students from both preclinical and clinical phases of training to engage in interactive conversations with both LearnLM and the base model, in a randomized and blinded manner. Students used the evaluation scenarios to role-play as different types of learners across a range of learning goals and personas, generating 290 conversations for analysis. Each scenario provided learners with context to standardize the interaction as much as possible between both models, including a learning goal, grounding materials, a learner persona, a conversation plan, and the initial query used by the learner to start the conversation.

GenAI for Medical Education-2

Example scenario used to evaluate LearnLM capabilities in the context of medical education settings.

Students then rated model behavior by comparing the two interactions for each scenario side-by-side across four criteria: (1) overall experience, (2) meeting learning needs, (3) enjoyability, and (4) understandability. Physician educators rated model behavior by reviewing conversation transcripts and scenario specifications. For each scenario, educators reviewed the transcripts from both learner-model conversations side-by-side, and provided preference ratings across five criteria: (1) demonstrating pedagogy, (2) behaving like a very good human tutor, (3) instruction following, (4) adapting to the learner, and (5) supporting the learning goal. We collected a median of three independent educator reviews per conversation pair. All preference ratings were done in a randomized and blinded manner using 7-point scales, which reflected a spectrum of preference strengths including the option to express no preference between the two models.

Physician educators consistently preferred LearnLM across all five of the comparison criteria. They judged LearnLM particularly positively in terms of demonstrating better pedagogy (on average, +6.1% on our rating scale) and for behaving “more like a very good human tutor” (+6.8%). When we simply look at whether educators expressed any preference one way or the other — regardless of its magnitude — LearnLM emerged as their choice in a clear majority of assessments across every criterion. Medical students indicated the strongest positive preference in terms of LearnLM being more enjoyable to interact with (on average, +9.9% on our rating scale). Student preferences were less pronounced for the other three comparison criteria, while directionally also favoring LearnLM.

This study points to LearnLM’s potential to transform education and learning paradigms and scale a competent health workforce. None of the data used for model development or evaluation in this study included real patient data. See the tech report for modeling details.

GenAI for Medical Education-3

Preferences expressed by physician educators and medical students, showing the proportion of ratings that favored each model across medical education scenarios.

Reimagining health professions education

We recently shared this research at the MedEd on the Edge conference at the Nobel Forum and facilitated a hands-on workshop with the international medical education community to explore these possibilities. We recognize the dual role of educators as both pedagogical experts and explorers in this rapidly evolving knowledge domain. Realizing a responsible future requires careful attention to challenges such as ensuring accuracy, mitigating bias, and maintaining the crucial role of human interaction and oversight. It underscores the need to re-evaluate competencies and entrustable professional activities, and for curricula that cultivate adaptive expertise, focusing not only on AI applications in education, but also on teaching foundational understanding of AI itself. At this convergence, generative AI can serve as a catalyst for the desired productive struggle to foster deeper understanding and critical thinking. As the journey has only just begun, below are a few examples of how Google’s AI can potentially transform health professions education.

Video preview image

Watch the film

Examples of how educators and learners can use Google’s AI to reimagine education for health professions. LearnLM capabilities are now integrated and available with Gemini 2.5 Pro.

Conclusion

This research continues to lay the groundwork toward the effective design and implementation of personalized learning experiences, offering an opportunity to accelerate clinical competency and ultimately improve health outcomes by reimagining health professions education. We are committed to partnering with the health professions education community to thoughtfully and responsibly prepare future healthcare professionals to thrive in an AI-augmented healthcare landscape.

Acknowledgements

The research described here is a joint effort across Google Research, Google for Health, Google DeepMind, and partnering teams. The following researchers contributed to this work: Kevin McKee, Dan Gillick, Irina Jurenka, Markus Kunesch, Kaiz Alarakyia, Miriam Schneider, Jenn Sturgeon, Maggie Shiels, Amy Wang, Roma Ruparel, Anna Iurchenko, Mahvish Nagda, Julie Anne Séguin, Divya Pandya, Patricia Strachan, Renee Wong, Renee Schneider, Viknesh Sounderajah, Pete Clardy, Garth Graham, Megan Jones Bell, Michael Howell, Jonathan Krause, Christopher Semturs, Dale Webster, Avinatan Hassidim, Joëlle Barral, Ronit Levavi Morad and Yossi Matias. Special thanks to participants who contributed to these studies.