Ayelet Benjamini
Ayelet Benjamini finished her BSc in Physics and Chemistry in the Hebrew University in Jerusalem and holds a PhD in Computational Chemistry from the University of California in Berkeley.
At Google, Ayelet is focused on Climate AI research across multiple disciplines including remote sensing, geospatial modeling and time series prediction.
Authored Publications
Sort By
Structured Understanding of Assessment and Plans in Clinical Documentation
Doron Yaya-Stupp
Ronnie Barequet
I-Ching Lee
Eyal Oren
Eran Ofek
Alvin Rajkomar
medRxiv (2022)
Preview abstract
Physicians record their detailed thought-processes about diagnoses and treatments as unstructured text in a section of a clinical note called the assessment and plan. This information is more clinically rich than structured billing codes assigned for an encounter but harder to reliably extract given the complexity of clinical language and documentation habits. We describe and release a dataset containing annotations of 579 admission and progress notes from the publicly available and de-identified MIMIC-III ICU dataset with over 30,000 labels identifying active problems, their assessment, and the category of associated action items (e.g. medication, lab test). We also propose deep-learning based models that approach human performance, with a F1 score of 0.88. We found that by employing weak supervision and domain specific data-augmentation, we could improve generalization across departments and reduce the number of human labeled notes without sacrificing performance.
View details
Building a Clinically-Focused Problem List From Medical Notes
Birju Patel
Cathy Cheung
Liwen Xu
Peter Clardy
Rachana Fellinger
LOUHI 2022: The 13th International Workshop on Health Text Mining and Information Analysis (2022)
Preview abstract
Clinical notes often contain vital information not observed in other structured data, but their unstructured nature can lead to critical patient-related information being lost. To make sure this valuable information is utilized for patient care, algorithms that summarize notes into a problem list are often proposed. Focusing on identifying medically-relevant entities in the free-form text, these solutions are often detached from a canonical ontology and do not allow downstream use of the detected text-spans. As a solution, we present here a system for generating a canonical problem list from medical notes, consisting of two major stages. At the first stage, annotation, we use a transformer model to detect all clinical conditions which are mentioned in a single note. These clinical conditions are then grounded to a predefined ontology, and are linked to spans in the text. At the second stage, summarization, we aggregate over the set of clinical conditions detected on all of the patient's note, and produce a concise patient summary that organizes their important conditions.
View details
Section Classification in Clinical Notes with Multi-task Transformers
Fan Zhang
LOUHI 2022: The 13th International Workshop on Health Text Mining and Information Analysis (2022)
Preview abstract
Clinical notes are the backbone of electronic health records, often containing vital information not observed in other structured data. Unfortunately, the unstructured nature of clinical notes can lead to critical patient-related information being lost. Algorithms that organize clinical notes into distinct sections are often proposed in order to allow medical professionals to better access information in a given note. These algorithms, however, often assume a given partition over the note, and only classify section types given this information. In this paper, we propose a multi-task solution for note sectioning, where one model can identify context changes and label each section with its medically-relevant title. Results on in-distribution (MIMIC-III) and out-of-distribution (private held-out) datasets reveal that our multi-task approach can successfully identify note sections across different hospital systems.
View details
Learning and Evaluating a Differentially Private Pre-trained Language Model
Shlomo Hoory
Avichai Tendler
Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics, Punta Cana, Dominican Republic, pp. 1178-1189
Preview abstract
Contextual language models have led to significantly better results on a plethora of language understanding tasks, especially when pre-trained on the same data as the downstream task. While this additional pre-training usually improves performance, it often leads to information leakage and therefore risks the privacy of individuals mentioned in the training data. One method to guarantee the privacy of such individuals is to train a differentially private model, but this usually comes at the expense of model performance. Moreover, it is hard to tell given a privacy parameter $\epsilon$ what was the effect on the trained representation and whether it maintained relevant information while improving privacy. To improve privacy and guide future practitioners and researchers, we demonstrate here how to train a differentially private pre-trained language model (i.e., BERT) with a privacy guarantee of $\epsilon=0.5$ with only a small degradation in performance. We experiment on a dataset of clinical notes with a model trained on an entity extraction (EE) task on and compare it to a similar model trained without differential privacy. Finally, we present a series of experiments showing how to interpret the differentially private representation and understand the information lost and maintained in this process.
View details