Michael D. Howell, MD MPH
Michael is the Chief Clinical Officer at Google, where he leads the team of clinical experts who provide guidance for Google’s health-related products, research, and services. His career has been devoted to improving the quality, safety, and science of how care is delivered and helping people get the best information across their health journey. He previously served as the University of Chicago Medicine's Chief Quality Officer, was associate professor of medicine at the University of Chicago and at Harvard Medical School, and practiced pulmonary and critical care medicine for many years. Michael has published more than 100 research articles, editorials, and book chapters, and is the author of Understanding Healthcare Delivery Science, one of the foundational textbooks in the field. He has also served as an advisor for the CDC, for the Centers for Medicare and Medicaid Services, and for the National Academy of Medicine.
Research Areas
Authored Publications
Sort By
Preview abstract
Importance: Interest in artificial intelligence (AI) has reached an all-time high, and health care leaders across the ecosystem are faced with questions about where, when, and how to deploy AI and how to understand its risks, problems, and possibilities.
Observations: While AI as a concept has existed since the 1950s, all AI is not the same. Capabilities and risks of various kinds of AI differ markedly, and on examination 3 epochs of AI emerge. AI 1.0 includes symbolic AI, which attempts to encode human knowledge into computational rules, as well as probabilistic models. The era of AI 2.0 began with deep learning, in which models learn from examples labeled with ground truth. This era brought about many advances both in people’s daily lives and in health care. Deep learning models are task-specific, meaning they do one thing at a time, and they primarily focus on classification and prediction. AI 3.0 is the era of foundation models and generative AI. Models in AI 3.0 have fundamentally new (and potentially transformative) capabilities, as well as new kinds of risks, such as hallucinations. These models can do many different kinds of tasks without being retrained on a new dataset. For example, a simple text instruction will change the model’s behavior. Prompts such as “Write this note for a specialist consultant” and “Write this note for the patient’s mother” will produce markedly different content.
Conclusions and Relevance: Foundation models and generative AI represent a major revolution in AI’s capabilities, ffering tremendous potential to improve care. Health care leaders are making decisions about AI today. While any heuristic omits details and loses nuance, the framework of AI 1.0, 2.0, and 3.0 may be helpful to decision-makers because each epoch has fundamentally different capabilities and risks.
View details
Preview abstract
Technical innovations over the past 20 years have changed the way we live many parts of our daily lives, but there is resounding agreement that progress in health care has not kept pace. While many believe that technology will improve health outcomes, there is a real and persistent concern that technology companies do not understand the complexity of health and health care. This challenge is usually discussed as an either/or problem. Either technology companies must disrupt the way that health care works, or they won’t succeed because they will never understand the real world of health and health care. The authors believe that there is a third way — one that establishes a robust, thriving clinical team within a major technology company that brings a deep understanding of the current health care system to bear and a passion to make real improvements. However, clinical teams represent new functions for technology companies, and so they also represent a cultural shift. This article summarizes several years of experience building Google’s clinical team, and later adapting it during Covid-19, to offer six lessons for organizations embarking on similar journeys.
View details
Preview abstract
Barriers to timely data collection and exchange hindered health departments throughout COVID-19, from fax machines creating bottlenecks for disease monitoring to inconsistent reporting of race and ethnicity. Modernizing public health data systems has become a bipartisan postpandemic imperative, with President Trump engaging the US Digital Service to improve data exchange and President Biden issuing an Executive Order on his second day in office to advance public health data and analytics.
These initiatives should be informed by the experience of digitizing health care delivery. The Health Information Technology for Economic and Clinical Health (HITECH) Act drove the near-universal adoption of certified electronic health records (EHRs). However, progress was not without pitfalls, from regulatory requirements affecting EHR usability, to new reporting, billing, and patient engagement processes disrupting workflows, to proprietary standards hindering interoperability.1 This Viewpoint explores lessons from HITECH for public health data modernization for COVID-19 and beyond.
View details
Privacy-first Health Research with Federated Learning
Adam Sadilek
Dung Nguyen
Methun Kamruzzaman
Benjamin Rader
Stefan Mellem
Elaine O. Nsoesie
Jamie MacFarlane
Anil Vullikanti
Madhav Marathe
Paul C. Eastham
John S. Brownstein
John Hernandez
npj Digital Medicine (2021)
Preview abstract
Privacy protection is paramount in conducting health research. However, studies often rely on data stored in a centralized repository, where analysis is done with full access to the sensitive underlying content. Recent advances in federated learning enable building complex machine-learned models that are trained in a distributed fashion. These techniques facilitate the calculation of research study endpoints such that private data never leaves a given device or healthcare system. We show—on a diverse set of single and multi-site health studies—that federated models can achieve similar accuracy, precision, and generalizability, and lead to the same interpretation as standard centralized statistical models while achieving considerably stronger privacy protections and without significantly raising computational costs. This work is the first to apply modern and general federated learning methods that explicitly incorporate differential privacy to clinical and epidemiological research—across a spectrum of units of federation, model architectures, complexity of learning tasks and diseases. As a result, it enables health research participants to remain in control of their data and still contribute to advancing science—aspects that used to be at odds with each other.
View details
Predicting inpatient medication orders from electronic health record data
Kathryn Rough
Kun Zhang
Atul J. Butte
Alvin Rajkomar
Clinical Pharmacology and Therapeutics (2020)
Preview abstract
In a general inpatient population, we predicted patient‐specific medication orders based on structured information in the electronic health record (EHR). Data on over three million medication orders from an academic medical center were used to train two machine‐learning models: A deep learning sequence model and a logistic regression model. Both were compared with a baseline that ranked the most frequently ordered medications based on a patient’s discharge hospital service and amount of time since admission. Models were trained to predict from 990 possible medications at the time of order entry. Fifty‐five percent of medications ordered by physicians were ranked in the sequence model’s top‐10 predictions (logistic model: 49%) and 75% ranked in the top‐25 (logistic model: 69%). Ninety‐three percent of the sequence model’s top‐10 prediction sets contained at least one medication that physicians ordered within the next day. These findings demonstrate that medication orders can be predicted from information present in the EHR.
View details
Customization Scenarios for De-identification of Clinical Notes
Danny Vainstein
Gavin Edward Bee
Jack Po
Jutta Williams
Kat Chou
Ronit Yael Slyper
Rony Amira
Shlomo Hoory
Tzvika Hartman
BMC Medical Informatics and Decision Making (2020)
Preview abstract
Background: Automated machine-learning systems are able to de-identify electronic medical records, including free-text clinical notes. Use of such systems would greatly boost the amount of data available to researchers, yet their deployment has been limited due to uncertainty about their performance when applied to new datasets.
Objective: We present practical options for clinical note de-identification, assessing performance of machine learning systems ranging from off-the-shelf to fully customized.
Methods: We implement a state-of-the-art machine learning de-identification system, training and testing on pairs of datasets that match the deployment scenarios. We use clinical notes from two i2b2 competition corpora, the Physionet Gold Standard corpus, and parts of the MIMIC-III dataset.
Results: Fully customized systems remove 97-99% of personally identifying information. Performance of off-the-shelf systems varies by dataset, with performance mostly above 90%. Providing a small labeled dataset or large unlabeled dataset allows for fine-tuning that improves performance over off-the-shelf systems.
Conclusion: Health organizations should be aware of the levels of customization available when selecting a de-identification deployment solution, in order to choose the one that best matches their resources and target performance level.
View details
Ensuring Fairness in Machine Learning to Advance Health Equity
Alvin Rishi Rajkomar
Marshall Chin
Mila Hardt
Annals of Internal Medicine (2018)
Preview abstract
A central promise of machine learning (ML) is to use historical data to project the future trajectories of patients. Will they have a good or bad outcome? What diagnoses will they have? What treatments should they be given? But in many cases, we do not want the future to look like the past, especially when the past contains patterns of human or structural biases against vulnerable populations.
This is not an abstract problem. In a model used to predict future crime based on historical records, black defendants who did not re-offend were classified as high-risk at a substantially higher rate than white defendants who did not re-offend.22 Similar biases have been observed in predictive policing,23 social services,24 and technology companies.25 Given known healthcare disparities, this problem will nearly inevitably surface in medical domains where ML could be applied(Table 1): a "protected group" could be systematically excluded from the benefits of a ML system or even harmed. We argue that ML systems should be fair, which is defined in medical ethics as the "moral obligation to act on the basis of fair adjudication between competing claims."26
Recent advances in computer science have offered mathematical and procedural suggestions to make a ML system fairer: ensures it is equally accurate for patients in a protected class, allocates resources to protected classes proportional to need, leads to better patient outcomes for all, and is built and tested in ways that protect the privacy, expectations, and trust of patients.
To guide clinicians, administrators, policymakers, and regulators in making principled decisions to improve ML fairness, we illustrate the mechanisms by which a model could be unfair. We then review both technical and non-technical solutions to improve fairness. Finally, we make policy recommendations to stakeholders specifying roles, responsibilities and oversight.
View details
Scalable and accurate deep learning for electronic health records
Alvin Rishi Rajkomar
Eyal Oren
Nissan Hajaj
Mila Hardt
Peter J. Liu
Xiaobing Liu
Jake Marcus
Patrik Per Sundberg
Kun Zhang
Yi Zhang
Gerardo Flores
Gavin Duggan
Jamie Irvine
Kurt Litsch
Alex Mossin
Justin Jesada Tansuwan
De Wang
Dana Ludwig
Samuel Volchenboum
Kat Chou
Michael Pearson
Srinivasan Madabushi
Nigam Shah
Atul Butte
npj Digital Medicine (2018)
Preview abstract
Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient’s record. We propose a representation of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two U.S. academic medical centers with 216,221 adult patients hospitalized for at least 24 hours. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting: in-hospital mortality (AUROC across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient’s final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed state-of-the-art traditional predictive models in all cases. We also present a case-study of a neural-network attribution system, which illustrates how clinicians can gain some transparency into the predictions. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios, complete with explanations that directly highlight evidence in the patient’s chart.
View details
Preview abstract
GUIDELINE TITLE: Mechanical Ventilation in Adult Patients With Acute Respiratory Distress Syndrome
DEVELOPER: American Thoracic Society (ATS)/European Society of Intensive Care Medicine (ESICM)/Society of Critical Care Medicine (SCCM)
RELEASE DATE: May 1, 2017
TARGET POPULATION: Hospitalized adults with acute respiratory distress syndrome (ARDS).
SELECTED MAJOR RECOMMENDATIONS:
For all patients with ARDS:
• Use lower tidal volumes of 4 to 8 mL/kg per breath, calculated using predicted body weight (PBW) (strong recommendation; moderate confidence in effect estimate).
• Use lower inspiratory pressures, targeting a plateau pressure <30 cm H2O (strong recommendation; moderate confidence).
For patients with severe ARDS (PaO2/FIO2 ratio <100):
• Use prone positioning for at least 12h/d (strong recommendation; moderate confidence).
• Do not routinely use high-frequency oscillatory ventilation (strong recommendation; high confidence).
• Additional evidence is needed to recommend for or against the use of extracorporeal membrane oxygenation (ECMO) in severe ARDS.
View details