Mathias Fleck
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Conversational AI in health: Design considerations from a Wizard-of-Oz dermatology case study with users, clinicians and a medical LLM
    Brenna Li
    Amy Wang
    Patricia Strachan
    Julie Anne Seguin
    Sami Lachgar
    Karyn Schroeder
    Renee Wong
    Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, pp. 10
    Preview abstract Although skin concerns are common, access to specialist care is limited. Artificial intelligence (AI)-assisted tools to support medical decisions may provide patients with feedback on their concerns while also helping ensure the most urgent cases are routed to dermatologists. Although AI-based conversational agents have been explored recently, how they are perceived by patients and clinicians is not well understood. We conducted a Wizard-of-Oz study involving 18 participants with real skin concerns. Participants were randomly assigned to interact with either a clinician agent (portrayed by a dermatologist) or an LLM agent (supervised by a dermatologist) via synchronous multimodal chat. In both conditions, participants found the conversation to be helpful in understanding their medical situation and alleviate their concerns. Through qualitative coding of the conversation transcripts, we provide insight on the importance of empathy and effective information-seeking. We conclude with design considerations for future AI-based conversational agents in healthcare settings. View details
    Preview abstract Background: Although effective mental health treatments exist, the ability to match individuals to optimal treatments is poor, and timely assessment of response is difficult. One reason for these challenges is the lack of objective measurement of psychiatric symptoms. Sensors and active tasks recorded by smartphones provide a low-burden, low-cost, and scalable way to capture real-world data from patients that could augment clinical decision-making and move the field of mental health closer to measurement-based care. Objective: This study tests the feasibility of a fully remote study on individuals with self-reported depression using an Android-based smartphone app to collect subjective and objective measures associated with depression severity. The goals of this pilot study are to develop an engaging user interface for high task adherence through user-centered design; test the quality of collected data from passive sensors; start building clinically relevant behavioral measures (features) from passive sensors and active inputs; and preliminarily explore connections between these features and depression severity. Methods: A total of 600 participants were asked to download the study app to join this fully remote, observational 12-week study. The app passively collected 20 sensor data streams (eg, ambient audio level, location, and inertial measurement units), and participants were asked to complete daily survey tasks, weekly voice diaries, and the clinically validated Patient Health Questionnaire (PHQ-9) self-survey. Pairwise correlations between derived behavioral features (eg, weekly minutes spent at home) and PHQ-9 were computed. Using these behavioral features, we also constructed an elastic net penalized multivariate logistic regression model predicting depressed versus nondepressed PHQ-9 scores (ie, dichotomized PHQ-9). Results: A total of 415 individuals logged into the app. Over the course of the 12-week study, these participants completed 83.35% (4151/4980) of the PHQ-9s. Applying data sufficiency rules for minimally necessary daily and weekly data resulted in 3779 participant-weeks of data across 384 participants. Using a subset of 34 behavioral features, we found that 11 features showed a significant (P<.001 Benjamini-Hochberg adjusted) Spearman correlation with weekly PHQ-9, including voice diary–derived word sentiment and ambient audio levels. Restricting the data to those cases in which all 34 behavioral features were present, we had available 1013 participant-weeks from 186 participants. The logistic regression model predicting depression status resulted in a 10-fold cross-validated mean area under the curve of 0.656 (SD 0.079). Conclusions: This study finds a strong proof of concept for the use of a smartphone-based assessment of depression outcomes. Behavioral features derived from passive sensors and active tasks show promising correlations with a validated clinical measure of depression (PHQ-9). Future work is needed to increase scale that may permit the construction of more complex (eg, nonlinear) predictive models and better handle data missingness. JMIR Ment Health 2021;8(8):e27589 View details
    Preview abstract This case study describes a usability assessment of lightweight automated external defibrillators (AEDs) in an exploration of how AEDs might safely integrate within an unmanned aerial vehicle (UAV) delivery system to rapidly treat victims of cardiac arrest. Untrained laypersons were asked to use an AED in a simulated cardiac arrest scenario in either standard or UAV-delivery scenarios. The impact of device-specific customization of emergency operator instruction was also evaluated. View details