Introduction: Auto-charting -- creation structured sections of clinical notes generated directly from a patient-doctor encounter -- holds promise to lift documentation burden from physicians. However, clinicians exercise professional judgement in what and how to document, and it is unknown if a machine learning (ML) model could assist with these tasks.
Objective: Build a ML model to extract symptoms and status (i.e. experienced, not-experienced, not relevant for note) from transcripts of patient-doctor encounters and assess performance on common symptoms and conversations in which a human interpreterscribe is not used.
Methods: We generated a ML model to auto-generate a review of systems (ROS) from transcripts of 90,000 de-identified medical encounters. 2950 transcripts were labeled by medical scribes to identify 171 common symptoms. Model accuracy was stratified by how clearly a symptom was mentioned in conversation for 800 snippets, which was assessed by a formal rating system termed conversational clarity. The model was also qualitatively assessed in a variety of conversational motifs.
Results: Overall, the model had a sensitivity of 0.71 of matching the exact symptom labeled by a human with a positive predictive value of 0.69. Model sensitivity was associated with the clarity of a conversational (p<0.0001). 39.5% (316/800) snippets of common symptoms contained symptoms mentioned with high clarity, and in this group, the sensitivity of the model was 0.91. The model was robust to a variety of conversational motifs (e.g. detecting symptoms mentioned in colloquial ways).
Conclusions: Auto-generating a review of systems is feasible across a wide-range symptoms that are commonly discussed in doctor-patient encounter