Google Research

Human-centric Metric for Accelerating Pathology Reports Annotation

arXiv (2019)


Pathology medical reports written by physicians contain useful class information such as the main organ type, disease type, etc. These class information can be used for large-scale statistical analysis or labelling data in other modalities such as pathology slices (images). However, manual classification for a huge number of reports on multiple tasks are very inefficient. Moreover, they are very hard to read for non-professionals. In this paper, we investigate a general-purpose NLP model called BERT on multilabel text classification. We test it on five different classification tasks and achieve good discrimination. More importantly, we evaluate it under practical situation by measuring how much human labor on annotation can be saved and the performance on automatically classified cases.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work