 
                Ali Heydari
Research Areas
      Authored Publications
    
  
  
  
    
    
  
      
        Sort By
        
        
    
    
        
          
            
              The Anatomy of a Personal Health Agent
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Ahmed Metwally
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ken Gu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jiening Zhan
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kumar Ayush
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Hong Yu
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Amy Lee
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Qian He
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zhihan Zhang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Isaac Galatzer-Levy
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Xavi Prieto
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Andrew Barakat
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ben Graef
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yuzhe Yang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Daniel McDuff
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Brent Winslow
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Shwetak Patel
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Girish Narayanswamy
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Conor Heneghan
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Max Xu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jacqueline Shreibati
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Mark Malhotra
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Orson Xu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Tim Althoff
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Tony Faranesh
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Nova Hammerquist
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Vidya Srinivas
                      
                    
                  
              
            
          
          
          
          
            arXiv (2025)
          
          
        
        
        
          
              Preview abstract
          
          
              Health is a fundamental pillar of human wellness, and the rapid advancements in large language models (LLMs) have driven the development of a new generation of health agents. However, the solution to fulfill diverse needs from individuals in daily non-clinical settings is underexplored. In this work, we aim to build a comprehensive personal health assistant that is able to reason about multimodal data from everyday consumer devices and personal health records. To understand end users’ needs when interacting with such an assistant, we conducted an in-depth analysis of query data from users, alongside qualitative insights from users and experts gathered through a user-centered design process. Based on these findings, we identified three major categories of consumer health needs, each of which is supported by a specialist subagent: (1) a data science agent that analyzes both personal and population-level time-series wearable and health record data to provide numerical health insights, (2) a health domain expert agent that integrates users’ health and contextual data to generate accurate, personalized insights based on medical and contextual user knowledge, and (3) a health coach agent that synthesizes data insights, drives multi-turn user interactions and interactive goal setting, guiding users using a specified psychological strategy and tracking users’ progress. Furthermore, we propose and develop a multi-agent framework, Personal Health Insight Agent Team (PHIAT), that enables dynamic, personalized interactions to address individual health needs. To evaluate these individual agents and the multi-agent system, we develop a set of N benchmark tasks and conduct both automated and human evaluations, involving 100’s of hours of evaluation from health experts, and 100’s of hours of evaluation from end-users. Our work establishes a strong foundation towards the vision of a personal health assistant accessible to everyone in the future and represents the most comprehensive evaluation of a consumer AI health agent to date.
              
  
View details
          
        
      
    
        
          
            
              RADAR: Benchmarking Language Models on Imperfect Tabular Data
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Ken Gu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kumar Ayush
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Hong Yu
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zhihan Zhang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yuzhe Yang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Shwetak Patel
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Max Xu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Mark Malhotra
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Orson Xu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Evelyn Zhang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Tim Althoff
                      
                    
                  
              
            
          
          
          
          
            2025
          
          
        
        
        
          
              Preview abstract
          
          
              Language models (LMs) are increasingly being deployed to perform autonomous data analyses, yet their~\textit{\robustnessTerm}-- the ability to recognize, reason over, and appropriately handle data artifacts such as missing values, outliers, and logical inconsistencies—remains under-explored. These artifacts are common in real-world tabular data and, if mishandled, can significantly compromise the validity of analytical conclusions. To address this gap, we present RADAR, a benchmark for systematically evaluating data awareness on tabular data. RADAR introduces programmatic perturbations for each unique query table pair, enabling targeted evaluation of model behavior. RADAR~ comprises 2500 queries for data analysis across 55 datasets spanning 20 domains and 5 data awareness dimensions. In addition to evaluating artifact handling, RADAR systematically varies table size to study how reasoning performance scales with input length. In our evaluation, we identify fundamental gaps in their ability to perform reliable, data-aware analyses. Designed to be flexible and extensible, RADAR supports diverse perturbation types and controllable table sizes, offering a valuable resource for advancing tabular reasoning.
              
  
View details
          
        
      
    
        
          
            
              A Scalable Framework for Evaluating Health Language Models
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Neil Mallinar
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Tony Faranesh
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Brent Winslow
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Nova Hammerquist
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ben Graef
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Cathy Speed
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Mark Malhotra
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Shwetak Patel
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Xavi Prieto
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Daniel McDuff
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ahmed Metwally
                      
                    
                  
              
            
          
          
          
          
             (2025)
          
          
        
        
        
          
              Preview abstract
          
          
              Large language models (LLMs) have emerged as powerful tools for analyzing complex datasets. Recent studies demonstrate their potential to generate useful, personalized responses when provided with patient-specific health information that encompasses lifestyle, biomarkers, and context. As LLM-driven health applications are increasingly adopted, rigorous and efficient one-sided evaluation methodologies are crucial to ensure response quality across multiple dimensions, including accuracy, personalization and safety. Current evaluation practices for open-ended text responses heavily rely on human experts. This approach introduces human factors and is often cost-prohibitive, labor-intensive, and hinders scalability, especially in complex domains like healthcare where response assessment necessitates domain expertise and considers multifaceted patient data. In this work, we introduce Adaptive Precise Boolean rubrics: an evaluation framework that streamlines human and automated evaluation of open-ended questions by identifying gaps in model responses using a minimal set of targeted rubrics questions. Our approach is based on recent work in more general evaluation settings that contrasts a smaller set of complex evaluation targets with a larger set of more precise, granular targets answerable with simple boolean responses. We validate this approach in metabolic health, a domain encompassing diabetes, cardiovascular disease, and obesity. Our results demonstrate that Adaptive Precise Boolean rubrics yield higher inter-rater agreement among expert and non-expert human evaluators, and in automated assessments, compared to traditional Likert scales, while requiring approximately half the evaluation time of Likert-based methods. This enhanced efficiency, particularly in automated evaluation and non-expert contributions, paves the way for more extensive and cost-effective evaluation of LLMs in health.
              
  
View details
          
        
      
    
        
        
          
              Preview abstract
          
          
               Blood  biomarkers  are  an  essential  tool  for healthcare  providers  to  diagnose,  monitor,  and  treat  a wide  range  of  medical  conditions.  Establishing  personalized  blood  biomarker  ranges  is  crucial  for  accurate  dis-ease  diagnosis  and  management.  Current  clinical  ranges often  rely  on  population-level  statistics,  which  may  not adequately  account  for  the  substantial  influence  of  inter-individual variability driven by factors such as lifestyle and genetics.  In  this  work,  we  introduce  a  novel  framework for  predicting  future  blood  biomarker  values  and  personalized  reference  ranges  through  learned  representations from lifestyle data (physical activity and sleep) and blood biomarkers. Our proposed method learns a similarity-based embedding  space  that  aims  to  capture  the  complex  relationship  between  biomarkers  and  lifestyle  factors.  UsingUK Biobank (257K participants), our results show that our deep-learned  embeddings  outperform  traditional  and  cur-rent state-of-the-art representation learning techniques in predicting clinical diagnosis. Using a subset of UK Biobank of 6440 participants who have follow up visits, we validate that the inclusion of these embeddings and lifestyle factors directly in blood biomarker models improves the prediction of future lab values from a single lab visit. This personalized modeling  approach  provides  a  foundation  for  developing more accurate risk stratification tools and tailoring preventative  care  strategies.  In  clinical  settings,  this  translates to  the  potential  for  earlier  disease  detection,  more  timely interventions, and ultimately, a shift towards personalized healthcare.
              
  
View details
          
        
      
    
        
        
          
              Preview abstract
          
          
              Blood tests are an essential tool for healthcare providers to diagnose, monitor, and treat a wide range of medical conditions; however, quantitative approaches for personalizing such metrics are nascent and often ignore important factors such as lifestyle. Moreover, recent studies have shown that raw (untransformed) representations of health records are inadequate for constructing predictive models, especially when considering a single timepoint. In this work, we investigate the association of activity and sleep with blood test ranges, and based on our results, propose Proteus, a new deep metric learning algorithm that accounts for lifestyle. We show that Proteus significantly improves the performance of several downstream analyses, including the prediction of future health risk in currently-healthy patients using a single laboratory visit. Building upon our findings, we additionally introduce DeepRange, a novel lifestyle-informed algorithm which utilizes deep-learned embeddings for estimating personalized optimal blood test ranges. Our proposed methodology for personalized blood test ranges and single-visit health risk prediction can be readily implemented and has the potential to significantly improve health outcomes by enabling early intervention and personalized treatment.
              
  
View details
          
        
      
    
        
        
          
              Preview abstract
          
          
              We propose a novel formulation of the triplet objective function that improves metric learning without additional sample mining or overhead costs. Our approach aims to explicitly regularize the distance between the positive and negative samples in a triplet with respect to the anchor-negative distance. As an initial validation, we show that our method (called No Pairs Left Behind [NPLB]) improves upon the traditional and current state-of-the-art triplet objective formulations on standard benchmark datasets. To show the effectiveness and potentials of NPLB on real-world complex data, we evaluate our approach on a large-scale healthcare dataset (UK Biobank), demonstrating that the embeddings learned by our model significantly outperform all other current representations on tested downstream tasks. Additionally, we provide a new model-agnostic single-time health risk definition that, when used in tandem with the learned representations, achieves the most accurate prediction of subjects' future health complications. Our results indicate that NPLB is a simple, yet effective framework for improving existing deep metric learning models, showcasing the potential implications of metric learning in more complex applications, especially in the biological and healthcare domains.
              
  
View details
          
        
      
    