Ayelet Benjamini

Ayelet Benjamini

Ayelet Benjamini finished her BSc in Physics and Chemistry in the Hebrew University in Jerusalem and holds a PhD in Computational Chemistry from the University of California in Berkeley. At Google, Ayelet is focused on Climate AI research across multiple disciplines including remote sensing, geospatial modeling and time series prediction.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Physicians record their detailed thought-processes about diagnoses and treatments as unstructured text in a section of a clinical note called the assessment and plan. This information is more clinically rich than structured billing codes assigned for an encounter but harder to reliably extract given the complexity of clinical language and documentation habits. We describe and release a dataset containing annotations of 579 admission and progress notes from the publicly available and de-identified MIMIC-III ICU dataset with over 30,000 labels identifying active problems, their assessment, and the category of associated action items (e.g. medication, lab test). We also propose deep-learning based models that approach human performance, with a F1 score of 0.88. We found that by employing weak supervision and domain specific data-augmentation, we could improve generalization across departments and reduce the number of human labeled notes without sacrificing performance. View details
    Preview abstract Clinical notes often contain vital information not observed in other structured data, but their unstructured nature can lead to critical patient-related information being lost. To make sure this valuable information is utilized for patient care, algorithms that summarize notes into a problem list are often proposed. Focusing on identifying medically-relevant entities in the free-form text, these solutions are often detached from a canonical ontology and do not allow downstream use of the detected text-spans. As a solution, we present here a system for generating a canonical problem list from medical notes, consisting of two major stages. At the first stage, annotation, we use a transformer model to detect all clinical conditions which are mentioned in a single note. These clinical conditions are then grounded to a predefined ontology, and are linked to spans in the text. At the second stage, summarization, we aggregate over the set of clinical conditions detected on all of the patient's note, and produce a concise patient summary that organizes their important conditions. View details
    Section Classification in Clinical Notes with Multi-task Transformers
    Fan Zhang
    LOUHI 2022: The 13th International Workshop on Health Text Mining and Information Analysis (2022)
    Preview abstract Clinical notes are the backbone of electronic health records, often containing vital information not observed in other structured data. Unfortunately, the unstructured nature of clinical notes can lead to critical patient-related information being lost. Algorithms that organize clinical notes into distinct sections are often proposed in order to allow medical professionals to better access information in a given note. These algorithms, however, often assume a given partition over the note, and only classify section types given this information. In this paper, we propose a multi-task solution for note sectioning, where one model can identify context changes and label each section with its medically-relevant title. Results on in-distribution (MIMIC-III) and out-of-distribution (private held-out) datasets reveal that our multi-task approach can successfully identify note sections across different hospital systems. View details
    Learning and Evaluating a Differentially Private Pre-trained Language Model
    Shlomo Hoory
    Avichai Tendler
    Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics, Punta Cana, Dominican Republic, pp. 1178-1189
    Preview abstract Contextual language models have led to significantly better results on a plethora of language understanding tasks, especially when pre-trained on the same data as the downstream task. While this additional pre-training usually improves performance, it often leads to information leakage and therefore risks the privacy of individuals mentioned in the training data. One method to guarantee the privacy of such individuals is to train a differentially private model, but this usually comes at the expense of model performance. Moreover, it is hard to tell given a privacy parameter $\epsilon$ what was the effect on the trained representation and whether it maintained relevant information while improving privacy. To improve privacy and guide future practitioners and researchers, we demonstrate here how to train a differentially private pre-trained language model (i.e., BERT) with a privacy guarantee of $\epsilon=0.5$ with only a small degradation in performance. We experiment on a dataset of clinical notes with a model trained on an entity extraction (EE) task on and compare it to a similar model trained without differential privacy. Finally, we present a series of experiments showing how to interpret the differentially private representation and understand the information lost and maintained in this process. View details