Deciphering clinical abbreviations with a privacy protecting machine learning system

Alvin Rajkomar; Eric Loreaux; Yuchen Liu; Jonas Kemp; Benny Li; Ming-Jun Chen; Yi Zhang; Afroz Mohiuddin; Juraj Gottweis

Deciphering clinical abbreviations with a privacy protecting machine learning system

Alvin Rajkomar

Eric Loreaux

Yuchen Liu

Jonas Kemp

Benny Li

Ming-Jun Chen

Yi Zhang

Afroz Mohiuddin

Juraj Gottweis

Nature Communications (2022)

Download Google Scholar

Abstract

Physicians write clinical notes with abbreviations and shorthand that are difficult to decipher. Abbreviations can be clinical jargon (writing “HIT” for “heparin induced thrombocytopenia”), ambiguous terms that require expertise to disambiguate (using “MS” for “multiple sclerosis” or “mental status”), or domain-specific vernacular (“cb” for “complicated by”). Here we train machine learning models on public web data to decode such text by replacing abbreviations with their meanings. We report a single translation model that simultaneously detects and expands thousands of abbreviations in real clinical notes with accuracies ranging from 92.1%-97.1% on multiple external test datasets. The model equals or exceeds the performance of board-certified physicians (97.6% vs 88.7% total accuracy). Our results demonstrate a general method to contextually decipher abbreviations and shorthand that is built without any privacy-compromising data.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Deciphering clinical abbreviations with a privacy protecting machine learning system

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs