Michael D. Howell, MD MPH

Michael D. Howell, MD MPH

Michael is the Chief Clinical Officer at Google, where he leads the team of clinical experts who provide guidance for Google’s health-related products, research, and services. His career has been devoted to improving the quality, safety, and science of how care is delivered and helping people get the best information across their health journey. He previously served as the University of Chicago Medicine's Chief Quality Officer, was associate professor of medicine at the University of Chicago and at Harvard Medical School, and practiced pulmonary and critical care medicine for many years. Michael has published more than 100 research articles, editorials, and book chapters, and is the author of Understanding Healthcare Delivery Science, one of the foundational textbooks in the field. He has also served as an advisor for the CDC, for the Centers for Medicare and Medicaid Services, and for the National Academy of Medicine.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Importance: Interest in artificial intelligence (AI) has reached an all-time high, and health care leaders across the ecosystem are faced with questions about where, when, and how to deploy AI and how to understand its risks, problems, and possibilities. Observations: While AI as a concept has existed since the 1950s, all AI is not the same. Capabilities and risks of various kinds of AI differ markedly, and on examination 3 epochs of AI emerge. AI 1.0 includes symbolic AI, which attempts to encode human knowledge into computational rules, as well as probabilistic models. The era of AI 2.0 began with deep learning, in which models learn from examples labeled with ground truth. This era brought about many advances both in people’s daily lives and in health care. Deep learning models are task-specific, meaning they do one thing at a time, and they primarily focus on classification and prediction. AI 3.0 is the era of foundation models and generative AI. Models in AI 3.0 have fundamentally new (and potentially transformative) capabilities, as well as new kinds of risks, such as hallucinations. These models can do many different kinds of tasks without being retrained on a new dataset. For example, a simple text instruction will change the model’s behavior. Prompts such as “Write this note for a specialist consultant” and “Write this note for the patient’s mother” will produce markedly different content. Conclusions and Relevance: Foundation models and generative AI represent a major revolution in AI’s capabilities, ffering tremendous potential to improve care. Health care leaders are making decisions about AI today. While any heuristic omits details and loses nuance, the framework of AI 1.0, 2.0, and 3.0 may be helpful to decision-makers because each epoch has fundamentally different capabilities and risks. View details
    Safety principles for medical summarization using generative AI
    Dillon Obika
    Christopher Kelly
    Nicola Ding
    Chris Farrance
    Praney Mittal
    Donny Cheung
    Heather Cole-Lewis
    Madeleine Elish
    Nature Medicine (2024)
    Preview abstract The introduction of Generative AI, particularly large language models presents exciting opportunities for healthcare. However their novel capabilities also have the potential to introduce novel risks and hazards. This paper explores the unique safety challenges associated with LLMs in healthcare, using medical text summarization as a motivating example. Using MedLM as a case example, we propose leveraging existing standards and guidance while developing novel approaches tailored to the specific characteristics of LLMs. View details
    Preview abstract Technical innovations over the past 20 years have changed the way we live many parts of our daily lives, but there is resounding agreement that progress in health care has not kept pace. While many believe that technology will improve health outcomes, there is a real and persistent concern that technology companies do not understand the complexity of health and health care. This challenge is usually discussed as an either/or problem. Either technology companies must disrupt the way that health care works, or they won’t succeed because they will never understand the real world of health and health care. The authors believe that there is a third way — one that establishes a robust, thriving clinical team within a major technology company that brings a deep understanding of the current health care system to bear and a passion to make real improvements. However, clinical teams represent new functions for technology companies, and so they also represent a cultural shift. This article summarizes several years of experience building Google’s clinical team, and later adapting it during Covid-19, to offer six lessons for organizations embarking on similar journeys. View details
    Preview abstract Barriers to timely data collection and exchange hindered health departments throughout COVID-19, from fax machines creating bottlenecks for disease monitoring to inconsistent reporting of race and ethnicity. Modernizing public health data systems has become a bipartisan postpandemic imperative, with President Trump engaging the US Digital Service to improve data exchange and President Biden issuing an Executive Order on his second day in office to advance public health data and analytics. These initiatives should be informed by the experience of digitizing health care delivery. The Health Information Technology for Economic and Clinical Health (HITECH) Act drove the near-universal adoption of certified electronic health records (EHRs). However, progress was not without pitfalls, from regulatory requirements affecting EHR usability, to new reporting, billing, and patient engagement processes disrupting workflows, to proprietary standards hindering interoperability.1 This Viewpoint explores lessons from HITECH for public health data modernization for COVID-19 and beyond. View details
    Privacy-first Health Research with Federated Learning
    Adam Sadilek
    Dung Nguyen
    Methun Kamruzzaman
    Benjamin Rader
    Stefan Mellem
    Elaine O. Nsoesie
    Jamie MacFarlane
    Anil Vullikanti
    Madhav Marathe
    Paul C. Eastham
    John S. Brownstein
    John Hernandez
    npj Digital Medicine (2021)
    Preview abstract Privacy protection is paramount in conducting health research. However, studies often rely on data stored in a centralized repository, where analysis is done with full access to the sensitive underlying content. Recent advances in federated learning enable building complex machine-learned models that are trained in a distributed fashion. These techniques facilitate the calculation of research study endpoints such that private data never leaves a given device or healthcare system. We show—on a diverse set of single and multi-site health studies—that federated models can achieve similar accuracy, precision, and generalizability, and lead to the same interpretation as standard centralized statistical models while achieving considerably stronger privacy protections and without significantly raising computational costs. This work is the first to apply modern and general federated learning methods that explicitly incorporate differential privacy to clinical and epidemiological research—across a spectrum of units of federation, model architectures, complexity of learning tasks and diseases. As a result, it enables health research participants to remain in control of their data and still contribute to advancing science—aspects that used to be at odds with each other. View details
    Customization Scenarios for De-identification of Clinical Notes
    Danny Vainstein
    Gavin Edward Bee
    Jack Po
    Jutta Williams
    Kat Chou
    Ronit Yael Slyper
    Rony Amira
    Shlomo Hoory
    Tzvika Hartman
    BMC Medical Informatics and Decision Making (2020)
    Preview abstract Background: Automated machine-learning systems are able to de-identify electronic medical records, including free-text clinical notes. Use of such systems would greatly boost the amount of data available to researchers, yet their deployment has been limited due to uncertainty about their performance when applied to new datasets. Objective: We present practical options for clinical note de-identification, assessing performance of machine learning systems ranging from off-the-shelf to fully customized. Methods: We implement a state-of-the-art machine learning de-identification system, training and testing on pairs of datasets that match the deployment scenarios. We use clinical notes from two i2b2 competition corpora, the Physionet Gold Standard corpus, and parts of the MIMIC-III dataset. Results: Fully customized systems remove 97-99% of personally identifying information. Performance of off-the-shelf systems varies by dataset, with performance mostly above 90%. Providing a small labeled dataset or large unlabeled dataset allows for fine-tuning that improves performance over off-the-shelf systems. Conclusion: Health organizations should be aware of the levels of customization available when selecting a de-identification deployment solution, in order to choose the one that best matches their resources and target performance level. View details
    Preview abstract In a general inpatient population, we predicted patient‐specific medication orders based on structured information in the electronic health record (EHR). Data on over three million medication orders from an academic medical center were used to train two machine‐learning models: A deep learning sequence model and a logistic regression model. Both were compared with a baseline that ranked the most frequently ordered medications based on a patient’s discharge hospital service and amount of time since admission. Models were trained to predict from 990 possible medications at the time of order entry. Fifty‐five percent of medications ordered by physicians were ranked in the sequence model’s top‐10 predictions (logistic model: 49%) and 75% ranked in the top‐25 (logistic model: 69%). Ninety‐three percent of the sequence model’s top‐10 prediction sets contained at least one medication that physicians ordered within the next day. These findings demonstrate that medication orders can be predicted from information present in the EHR. View details
    Preview abstract GUIDELINE TITLE: Mechanical Ventilation in Adult Patients With Acute Respiratory Distress Syndrome DEVELOPER: American Thoracic Society (ATS)/European Society of Intensive Care Medicine (ESICM)/Society of Critical Care Medicine (SCCM) RELEASE DATE: May 1, 2017 TARGET POPULATION: Hospitalized adults with acute respiratory distress syndrome (ARDS). SELECTED MAJOR RECOMMENDATIONS: For all patients with ARDS: • Use lower tidal volumes of 4 to 8 mL/kg per breath, calculated using predicted body weight (PBW) (strong recommendation; moderate confidence in effect estimate). • Use lower inspiratory pressures, targeting a plateau pressure <30 cm H2O (strong recommendation; moderate confidence). For patients with severe ARDS (PaO2/FIO2 ratio <100): • Use prone positioning for at least 12h/d (strong recommendation; moderate confidence). • Do not routinely use high-frequency oscillatory ventilation (strong recommendation; high confidence). • Additional evidence is needed to recommend for or against the use of extracorporeal membrane oxygenation (ECMO) in severe ARDS. View details
    Scalable and accurate deep learning for electronic health records
    Alvin Rishi Rajkomar
    Eyal Oren
    Nissan Hajaj
    Mila Hardt
    Peter J. Liu
    Xiaobing Liu
    Jake Marcus
    Patrik Per Sundberg
    Kun Zhang
    Yi Zhang
    Gerardo Flores
    Gavin Duggan
    Jamie Irvine
    Kurt Litsch
    Alex Mossin
    Justin Jesada Tansuwan
    De Wang
    Dana Ludwig
    Samuel Volchenboum
    Kat Chou
    Michael Pearson
    Srinivasan Madabushi
    Nigam Shah
    Atul Butte
    npj Digital Medicine (2018)
    Preview abstract Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient’s record. We propose a representation of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two U.S. academic medical centers with 216,221 adult patients hospitalized for at least 24 hours. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting: in-hospital mortality (AUROC across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient’s final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed state-of-the-art traditional predictive models in all cases. We also present a case-study of a neural-network attribution system, which illustrates how clinicians can gain some transparency into the predictions. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios, complete with explanations that directly highlight evidence in the patient’s chart. View details
    Ensuring Fairness in Machine Learning to Advance Health Equity
    Alvin Rishi Rajkomar
    Marshall Chin
    Mila Hardt
    Annals of Internal Medicine (2018)
    Preview abstract A central promise of machine learning (ML) is to use historical data to project the future trajectories of patients. Will they have a good or bad outcome? What diagnoses will they have? What treatments should they be given? But in many cases, we do not want the future to look like the past, especially when the past contains patterns of human or structural biases against vulnerable populations. This is not an abstract problem. In a model used to predict future crime based on historical records, black defendants who did not re-offend were classified as high-risk at a substantially higher rate than white defendants who did not re-offend.22 Similar biases have been observed in predictive policing,23 social services,24 and technology companies.25 Given known healthcare disparities, this problem will nearly inevitably surface in medical domains where ML could be applied(Table 1): a "protected group" could be systematically excluded from the benefits of a ML system or even harmed. We argue that ML systems should be fair, which is defined in medical ethics as the "moral obligation to act on the basis of fair adjudication between competing claims."26 Recent advances in computer science have offered mathematical and procedural suggestions to make a ML system fairer: ensures it is equally accurate for patients in a protected class, allocates resources to protected classes proportional to need, leads to better patient outcomes for all, and is built and tested in ways that protect the privacy, expectations, and trust of patients. To guide clinicians, administrators, policymakers, and regulators in making principled decisions to improve ML fairness, we illustrate the mechanisms by which a model could be unfair. We then review both technical and non-technical solutions to improve fairness. Finally, we make policy recommendations to stakeholders specifying roles, responsibilities and oversight. View details