Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10100 publications
    Preview abstract This paper reports on disability representation in images output from text-to-image (T2I) generative AI systems. Through eight focus groups with 25 people with disabilities, we found that models repeatedly presented reductive archetypes for different disabilities. Often these representations reflected broader societal stereotypes and biases, which our participants were concerned to see reproduced through T2I. Our participants discussed further challenges with using these models including the current reliance on prompt engineering to reach satisfactorily diverse results. Finally, they offered suggestions for how to improve disability representation with solutions like showing multiple, heterogeneous images for a single prompt and including the prompt with images generated. Our discussion reflects on tensions and tradeoffs we found among the diverse perspectives shared to inform future research on representation-oriented generative AI system evaluation metrics and development processes. View details
    Preview abstract The article summarizes the unique challenges and strategies required for a successful GTM (Go to market) strategy in enterprise world. We cover how enterprise PM function is unique from regular PM, and why enterprise PMs must look at distribution as an inherent product process. We also share a framework for thinking about various components of GTM strategy. Key aspects include customer segmentation, account acquisition strategies, product packaging, positionining and marketing; and technical enablement and content distribution. View details
    Generative models improve fairness of medical classifiers under distribution shifts
    Ira Ktena
    Olivia Wiles
    Isabela Albuquerque
    Sylvestre-Alvise Rebuffi
    Ryutaro Tanno
    Danielle Belgrave
    Taylan Cemgil
    Nature Medicine (2024)
    Preview abstract Domain generalization is a ubiquitous challenge for machine learning in healthcare. Model performance in real-world conditions might be lower than expected because of discrepancies between the data encountered during deployment and development. Underrepresentation of some groups or conditions during model development is a common cause of this phenomenon. This challenge is often not readily addressed by targeted data acquisition and ‘labeling’ by expert clinicians, which can be prohibitively expensive or practically impossible because of the rarity of conditions or the available clinical expertise. We hypothesize that advances in generative artificial intelligence can help mitigate this unmet need in a steerable fashion, enriching our training dataset with synthetic examples that address shortfalls of underrepresented conditions or subgroups. We show that diffusion models can automatically learn realistic augmentations from data in a label-efficient manner. We demonstrate that learned augmentations make models more robust and statistically fair in-distribution and out of distribution. To evaluate the generality of our approach, we studied three distinct medical imaging contexts of varying difficulty: (1) histopathology, (2) chest X-ray and (3) dermatology images. Complementing real samples with synthetic ones improved the robustness of models in all three medical tasks and increased fairness by improving the accuracy of clinical diagnosis within underrepresented groups, especially out of distribution. View details
    Preview abstract Large language models (LLMs) hold promise to serve complex health information needs but also have the potential to introduce harm and exacerbate health disparities. Reliably evaluating equity-related model failures is a critical step toward developing systems that promote health equity. We present resources and methodologies for surfacing biases with potential to precipitate equity-related harms in long-form, LLM-generated answers to medical questions and conduct a large-scale empirical case study with the Med-PaLM 2 LLM. Our contributions include a multifactorial framework for human assessment of LLM-generated answers for biases and EquityMedQA, a collection of seven datasets enriched for adversarial queries. Both our human assessment framework and our dataset design process are grounded in an iterative participatory approach and review of Med-PaLM 2 answers. Through our empirical study, we find that our approach surfaces biases that may be missed by narrower evaluation approaches. Our experience underscores the importance of using diverse assessment methodologies and involving raters of varying backgrounds and expertise. While our approach is not sufficient to holistically assess whether the deployment of an artificial intelligence (AI) system promotes equitable health outcomes, we hope that it can be leveraged and built upon toward a shared goal of LLMs that promote accessible and equitable healthcare. View details
    ASTRA-5G: Automated Over-the-Air Security Testing and Research Architecture for 5G SA Devices
    Aanjhan Ranganathan
    Christina Pöpper
    Evangelos Bitsikas
    Michele Guerra
    Syed Khandker
    WiSec '24: Proceedings of the 17th ACM Conference on Security and Privacy in Wireless and Mobile Networks, ACM (2024)
    Preview abstract Despite the widespread deployment of 5G technologies, there exists a critical gap in security testing for 5G Standalone (SA) devices. Existing methods, largely manual and labor-intensive, are ill-equipped to fully uncover the state of security in the implementations of 5G-SA protocols and standards on devices, severely limiting the ability to conduct comprehensive evaluations. To address this issue, in this work, we introduce an novel, open-source framework that auto- mates the security testing process for 5G SA devices. By leveraging enhanced functionalities of 5G SA core and Radio Access Network (RAN) software, our framework offers a streamlined approach to generating, executing, and evaluating test cases, specifically focusing on the Non-Access Stratum (NAS) layer. Our application of this framework across multiple 5G SA devices provides in-depth security insights, significantly improving testing efficiency and breadth. View details
    Alignment of brain embeddings and artificial contextual embeddings in natural language points to common geometric patterns
    Ariel Goldstein
    Avigail Grinstein-Dabush
    Haocheng Wang
    Zhuoqiao Hong
    Bobbi Aubrey
    Samuel A. Nastase
    Zaid Zada
    Eric Ham
    Harshvardhan Gazula
    Eliav Buchnik
    Werner Doyle
    Sasha Devore
    Patricia Dugan
    Roi Reichart
    Daniel Friedman
    Orrin Devinsky
    Adeen Flinker
    Uri Hasson
    Nature Communications (2024)
    Preview abstract Contextual embeddings, derived from deep language models (DLMs), provide a continuous vectorial representation of language. This embedding space differs fundamentally from the symbolic representations posited by traditional psycholinguistics. We hypothesize that language areas in the human brain, similar to DLMs, rely on a continuous embedding space to represent language. To test this hypothesis, we densely record the neural activity patterns in the inferior frontal gyrus (IFG) of three participants using dense intracranial arrays while they listened to a 30-minute podcast. From these fine-grained spatiotemporal neural recordings, we derive a continuous vectorial representation for each word (i.e., a brain embedding) in each patient. We demonstrate that brain embeddings in the IFG and the DLM contextual embedding space have common geometric patterns using stringent zero-shot mapping. The common geometric patterns allow us to predict the brain embedding of a given left-out word in IFG based solely on its geometrical relationship to other nonoverlapping words in the podcast. Furthermore, we show that contextual embeddings better capture the geometry of IFG embeddings than static word embeddings. The continuous brain embedding space exposes a vector-based neural code for natural language processing in the human brain. View details
    Differences between Patient and Clinician Submitted Images: Implications for Virtual Care of Skin Conditions
    Rajeev Rikhye
    Grace Eunhae Hong
    Margaret Ann Smith
    Aaron Loh
    Vijaytha Muralidharan
    Doris Wong
    Michelle Phung
    Nicolas Betancourt
    Bradley Fong
    Rachna Sahasrabudhe
    Khoban Nasim
    Alec Eschholz
    Kat Chou
    Peggy Bui
    Justin Ko
    Steven Lin
    Mayo Clinic Proceedings: Digital Health (2024)
    Preview abstract Objective: To understand and highlight the differences in clinical, demographic, and image quality characteristics between patient-taken (PAT) and clinic-taken (CLIN) photographs of skin conditions. Patients and Methods: This retrospective study applied logistic regression to data from 2500 deidentified cases in Stanford Health Care’s eConsult system, from November 2015 to January 2021. Cases with undiagnosable or multiple conditions or cases with both patient and clinician image sources were excluded, leaving 628 PAT cases and 1719 CLIN cases. Demographic characteristic factors, such as age and sex were self-reported, whereas anatomic location, estimated skin type, clinical signs and symptoms, condition duration, and condition frequency were summarized from patient health records. Image quality variables such as blur, lighting issues and whether the image contained skin, hair, or nails were estimated through a deep learning model. Results: Factors that were positively associated with CLIN photographs, post-2020 were as follows: age 60 years or older, darker skin types (eFST V/VI), and presence of skin growths. By contrast, factors that were positively associated with PAT photographs include conditions appearing intermittently, cases with blurry photographs, photographs with substantial nonskin (or nail/hair) regions and cases with more than 3 photographs. Within the PAT cohort, older age was associated with blurry photographs. Conclusion: There are various demographic, clinical, and image quality characteristic differences between PAT and CLIN photographs of skin concerns. The demographic characteristic differences present important considerations for improving digital literacy or access, whereas the image quality differences point to the need for improved patient education and better image capture workflows, particularly among elderly patients. View details
    Preview abstract Evaluation of instruction following capabilities for multi-modal, multi-turn chat is challenging. With potentially multiple instructions in the input model context, the task is time-consuming for human raters and we show that LLM based judges are biased towards answers from the same model. We propose a new evaluation set, MMMT-IF, an image based multi-turn Q\&A task with added global instructions between questions, constraining the format of the answers. This reveals limitations of current models for following multiple instructions and is challenging as the models need to first retrieve multiple instructions spread out in the long chat history, and then reason over them to answer image based questions with instruction constraints. All the instructions and constraints are program verifiable, i.e., verifying them is objective. We propose a set of metrics referred to as Programmatic Instruction Following (PIF) to measure the fraction of the instructions that are correctly followed while performing a reasoning task, and PIF-TOP-N-K, to measure the fraction of time at least K out of N sampled model responses achieve PIF score of one. This is our most challenging metric, targeting both instruction following and robustness. We show that our proposed approach for evaluation of instruction following with the PIF metric is also aligned with ratings from humans, with over 70 percent correlation. Our experiments show that the models studied in this work, Gemini 1.5 Pro, GPT-4o, and Claude Sonnet 3.5, have a PIF metric that significantly deteriorate for long chats, highlighting an area with a significant headroom for improvement. Across all chat turns when each response is repeated 4 times (PIF-TOP-4-4), GPT-4o and Gemini are only able to successfully follow all instructions 11 percent of the time. When in addition to have instructions dispersed throughout the model input context, all the instructions are also added in the end of the model input context, we see an average 22.3 point improvement in the PIF metric, showing that the challenge with the task lies not only in following the instructions, but also in retrieving the instructions from the model context. View details
    Preview abstract We present a method for generating Streetscapes --- long sequences of views through an on-the-fly synthesized city-scale scene. Our generation is conditioned by language input (e.g., city name, weather), as well as an underlying map/layout hosting the desired trajectory. Compared to recent models for video generation or 3D view synthesis, our method can scale to much longer-range camera trajectories, spanning several city blocks, while maintaining visual quality and consistency. To achieve this goal, we build on recent work on video diffusion, used within an autoregressive framework that can easily scale to long sequences. In particular, we introduce a new temporal imputation method that prevents our autoregressive approach from drifting from the distribution of realistic city imagery. We train our Streetscapes system on a compelling source of data-posed imagery from Google Street View, along with contextual map data-which allows users to generate city views conditioned on any desired city layout, with controllable camera poses. View details
    Conversational AI in health: Design considerations from a Wizard-of-Oz dermatology case study with users, clinicians and a medical LLM
    Brenna Li
    Amy Wang
    Patricia Strachan
    Julie Anne Seguin
    Sami Lachgar
    Karyn Schroeder
    Renee Wong
    Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, pp. 10
    Preview abstract Although skin concerns are common, access to specialist care is limited. Artificial intelligence (AI)-assisted tools to support medical decisions may provide patients with feedback on their concerns while also helping ensure the most urgent cases are routed to dermatologists. Although AI-based conversational agents have been explored recently, how they are perceived by patients and clinicians is not well understood. We conducted a Wizard-of-Oz study involving 18 participants with real skin concerns. Participants were randomly assigned to interact with either a clinician agent (portrayed by a dermatologist) or an LLM agent (supervised by a dermatologist) via synchronous multimodal chat. In both conditions, participants found the conversation to be helpful in understanding their medical situation and alleviate their concerns. Through qualitative coding of the conversation transcripts, we provide insight on the importance of empathy and effective information-seeking. We conclude with design considerations for future AI-based conversational agents in healthcare settings. View details
    Preview abstract Recent Text-to-Image (T2I) generation models such as Stable Diffusion and Imagen have made significant progress in generating high-resolution images based on text descriptions. However, many generated images still suffer from issues such as artifacts/implausibility, misalignment with text descriptions, and low aesthetic quality. Inspired by the success of Reinforcement Learning with Human Feedback (RLHF) for large language models, prior work collected human-provided scores as feedback on generated images and trained a reward model to improve the T2I generation. In this paper, we enrich the feedback signal by (i) marking image regions that are implausible or misaligned with the text, and (ii) annotating which keywords in the text prompt are not represented in the image. We collect such rich human feedback on 18K generated images and train a multimodal transformer to predict these rich feedback automatically. We show that the predicted rich human feedback can be leveraged to improve image generation, for example, by selecting high-quality training data to finetune and improve the generative models, or by creating masks with predicted heatmaps to inpaint the problematic regions. Notably, the improvements generalize to models (Muse) beyond those used to generate the images on which human feedback data were collected (Stable Diffusion variants). View details
    Artificial Intelligence in Healthcare: A Perspective from Google
    Lily Peng
    Lisa Lehmann
    Artificial Intelligence in Healthcare, Elsevier (2024)
    Preview abstract Artificial Intelligence (AI) holds the promise of transforming healthcare by improving patient outcomes, increasing accessibility and efficiency, and decreasing the cost of care. Realizing this vision of a healthier world for everyone everywhere requires partnerships and trust between healthcare systems, clinicians, payers, technology companies, pharmaceutical companies, and governments to drive innovations in machine learning and artificial intelligence to patients. Google is one example of a technology company that is partnering with healthcare systems, clinicians, and researchers to develop technology solutions that will directly improve the lives of patients. In this chapter we share landmark trials of the use of AI in healthcare. We also describe the application of our novel system of organizing information to unify data in electronic health records (EHRs) and bring an integrated view of patient records to clinicians. We discuss our consumer focused innovation in dermatology to help guide search journeys for personalized information about skin conditions. Finally, we share a perspective on how to embed ethics and a concern for all patients into the development of AI. View details
    Preview abstract AI-generated images are proliferating as a new visual medium. However, state-of-the-art image generation models do not output alternative (alt) text with their images, rendering them largely inaccessible to screen reader users (SRUs). Moreover, less is known about what information would be most desirable to SRUs in this new medium. To address this, we invited AI image creators and SRUs to evaluate alt text prepared from various sources and write their own alt text for AI images. Our mixed-methods analysis makes three contributions. First, we highlight creators’ perspectives on alt text, as creators are well-positioned to write descriptions of their images. Second, we illustrate SRUs’ alt text needs particular to the emerging medium of AI images. Finally, we discuss the promises and pitfalls of utilizing text prompts written as input for AI models in alt text generation, and areas where broader digital accessibility guidelines could expand to account for AI images. View details
    Mindful Breathing as an Effective Technique in the Management of Hypertension
    Aravind Natarajan
    Hulya Emir-Farinas
    Hao-Wei Su
    Frontiers in Physiology, N/A (2024), N/A
    Preview abstract Introduction: Hypertension is one of the most important, modifiable risk factors for cardiovascular disease. The popularity of wearable devices provides an opportunity to test whether device guided slow mindful breathing may serve as a non-pharmacological treatment in the management of hypertension. Methods: Fitbit Versa-3 and Sense devices were used for this study. In addition, participants were required to own an FDA or Health Canada approved blood pressure measuring device. Advertisements were shown to 655,910 Fitbit users, of which 7,365 individuals expressed interest and filled out the initial survey. A total of 1,918 participants entered their blood pressure readings on at least 1 day and were considered enrolled in the study. Participants were instructed to download a guided mindful breathing app on their smartwatch device, and to engage with the app once a day prior to sleep. Participants measured their systolic and diastolic blood pressure prior to starting each mindful breathing session, and again after completion. All measurements were self reported. Participants were located in the United States or Canada. Results: Values of systolic and diastolic blood pressure were reduced following mindful breathing. There was also a decrease in resting systolic and diastolic measurements when measured over several days. For participants with a systolic pressure ≥ 130 mmHg, there was a decrease of 9.7 mmHg following 15 min of mindful breathing at 6 breaths per minute. When measured over several days, the resting systolic pressure decreased by an average of 4.3 mmHg. Discussion: Mindful breathing for 15 min a day, at a rate of 6 breaths per minute is effective in lowering blood pressure, and has both an immediate, and a short term effect (over several days). This large scale study demonstrates that device guided mindful breathing with a consumer wearable for 15 min a day is effective in lowering blood pressure, and a helpful complement to the standard of care. View details
    Stable quantum-correlated many-body states through engineered dissipation
    Xiao Mi
    Alexios Michailidis
    Sara Shabani
    Jerome Lloyd
    Rajeev Acharya
    Igor Aleiner
    Trond Andersen
    Markus Ansmann
    Frank Arute
    Kunal Arya
    Juan Atalaya
    Gina Bortoli
    Alexandre Bourassa
    Leon Brill
    Michael Broughton
    Bob Buckley
    Tim Burger
    Nicholas Bushnell
    Jimmy Chen
    Benjamin Chiaro
    Desmond Chik
    Charina Chou
    Josh Cogan
    Roberto Collins
    Paul Conner
    William Courtney
    Alex Crook
    Ben Curtin
    Alejo Grajales Dau
    Dripto Debroy
    Agustin Di Paolo
    ILYA Drozdov
    Andrew Dunsworth
    Lara Faoro
    Edward Farhi
    Reza Fatemi
    Vinicius Ferreira
    Ebrahim Forati
    Brooks Foxen
    Élie Genois
    William Giang
    Dar Gilboa
    Raja Gosula
    Steve Habegger
    Michael Hamilton
    Monica Hansen
    Sean Harrington
    Paula Heu
    Markus Hoffmann
    Trent Huang
    Ashley Huff
    Bill Huggins
    Sergei Isakov
    Justin Iveland
    Cody Jones
    Pavol Juhas
    Kostyantyn Kechedzhi
    Marika Kieferova
    Alexei Kitaev
    Andrey Klots
    Alexander Korotkov
    Fedor Kostritsa
    John Mark Kreikebaum
    Dave Landhuis
    Pavel Laptev
    Kim Ming Lau
    Lily Laws
    Joonho Lee
    Kenny Lee
    Yuri Lensky
    Alexander Lill
    Wayne Liu
    Orion Martin
    Amanda Mieszala
    Shirin Montazeri
    Alexis Morvan
    Ramis Movassagh
    Wojtek Mruczkiewicz
    Charles Neill
    Ani Nersisyan
    Michael Newman
    JiunHow Ng
    Murray Ich Nguyen
    Tom O'Brien
    Alex Opremcak
    Andre Petukhov
    Rebecca Potter
    Leonid Pryadko
    Charles Rocque
    Negar Saei
    Kannan Sankaragomathi
    Henry Schurkus
    Christopher Schuster
    Mike Shearn
    Aaron Shorter
    Noah Shutty
    Vladimir Shvarts
    Jindra Skruzny
    Clarke Smith
    Rolando Somma
    George Sterling
    Doug Strain
    Marco Szalay
    Alfredo Torres
    Guifre Vidal
    Cheng Xing
    Jamie Yao
    Ping Yeh
    Juhwan Yoo
    Grayson Young
    Yaxing Zhang
    Ningfeng Zhu
    Jeremy Hilton
    Anthony Megrant
    Yu Chen
    Vadim Smelyanskiy
    Dmitry Abanin
    Science, 383 (2024), pp. 1332-1337
    Preview abstract Engineered dissipative reservoirs have the potential to steer many-body quantum systems toward correlated steady states useful for quantum simulation of high-temperature superconductivity or quantum magnetism. Using up to 49 superconducting qubits, we prepared low-energy states of the transverse-field Ising model through coupling to dissipative auxiliary qubits. In one dimension, we observed long-range quantum correlations and a ground-state fidelity of 0.86 for 18 qubits at the critical point. In two dimensions, we found mutual information that extends beyond nearest neighbors. Lastly, by coupling the system to auxiliaries emulating reservoirs with different chemical potentials, we explored transport in the quantum Heisenberg model. Our results establish engineered dissipation as a scalable alternative to unitary evolution for preparing entangled many-body states on noisy quantum processors. View details