Jump to Content
Yossi Matias

Yossi Matias

Yossi Matias is Vice President, Engineering & Research, at Google. He is on Google Research leadership, a global leader of AI for Health, Climate and Education, and pioneering efforts in Generative AI. His team is working on impact-driven breakthrough research, innovation and moonshots, advancing society-centered AI to help transform products and address societal challenges with life saving and life changing technologies.

Yossi is the global exec lead of Google’s Health AI, driving AI research to help transform healthcare, and help make healthcare more accessible for everyone, with multiple breakthroughs including Med-PaLM. A leader of AI for Climate and Sustainability, he works on climate crisis mitigation (e.g., Greenlight) as well as adaptation - as the founding lead of Google’s Crisis Response initiative (SOS alerts, flood forecasting, wildfire detection). His work on Generative Ai includes efforts on Factuality (inc leading to Bard’s Double Check) and Efficiency (Speculative Decoding). Yossi pioneered Conversational AI innovations towards ambient intelligence to help transforming the phone experience (Google Duplex, Call Screen, Hold for Me) and help remove barriers of modality and languages (Live Caption, Live Relay, Euphonia, Read Aloud). He is a founding lead of Google’s AI for Social Good, Google for Startup Accelerator (with particular focus on AI & ML), Startups for Sustainable Development and Mind the Gap. He pioneered an initiative of bringing online hundreds of heritage collections (including the Dead Sea Scrolls), and helped establish Google’s Cultural Institute.

For over a decade Yossi was on Google’s Search leadership, driving strategic features and technologies (Autocomplete, Google Trends, Search Console, and Search experiences in weather, sports, dictionary and more). He is the founding managing director of Google’s center in Israel, (growing it to over 2500 on staff), supported Google’s growth (4X) in Bangalore, India, and oversees Google’s Expanding Research Center in Africa.

Yossi is on the Computer Science faculty at Tel Aviv University, and previously a Research Scientist at Bell Labs and visiting professor at Stanford. He’s published over 150 papers and is the inventor of over 70 patents. He pioneered some of the early technologies for internet privacy, contextual search, and the effective analysis of Big Data. He is a recipient of the Godel Prize, an ACM Fellow, and a recipient of the ACM Kanellakis Theory and Practice Award for seminal work on streaming algorithms, data sketches, and large-scale data analytics

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Advances in machine learning for health care have brought concerns about bias from the research community; specifically, the introduction, perpetuation, or exacerbation of care disparities. Reinforcing these concerns is the finding that medical images often reveal signals about sensitive attributes in ways that are hard to pinpoint by both algorithms and people. This finding raises a question about how to best design general purpose pretrained embeddings (GPPEs, defined as embeddings meant to support a broad array of use cases) for building downstream models that are free from particular types of bias. The downstream model should be carefully evaluated for bias, and audited and improved as appropriate. However, in our view, well intentioned attempts to prevent the upstream components—GPPEs—from learning sensitive attributes can have unintended consequences on the downstream models. Despite producing a veneer of technical neutrality, the resultant end-to-end system might still be biased or poorly performing. We present reasons, by building on previously published data, to support the reasoning that GPPEs should ideally contain as much information as the original data contain, and highlight the perils of trying to remove sensitive attributes from a GPPE. We also emphasise that downstream prediction models trained for specific tasks and settings, whether developed using GPPEs or not, should be carefully designed and evaluated to avoid bias that makes models vulnerable to issues such as distributional shift. These evaluations should be done by a diverse team, including social scientists, on a diverse cohort representing the full breadth of the patient population for which the final model is intended. View details
    Preview abstract Large language models (LLMs) hold immense promise to serve complex health information needs but also have the potential to introduce harm and exacerbate health disparities. Reliably evaluating equity-related model failures is a critical step toward developing systems that promote health equity. In this work, we present resources and methodologies for surfacing biases with potential to precipitate equity-related harms in long-form, LLM-generated answers to medical questions and then conduct an empirical case study with Med-PaLM 2, resulting in the largest human evaluation study in this area to date. Our contributions include a multifactorial framework for human assessment of LLM-generated answers for biases, and EquityMedQA, a collection of seven newly-released datasets comprising both manually-curated and LLM-generated questions enriched for adversarial queries. Both our human assessment framework and dataset design process are grounded in an iterative participatory approach and review of possible biases in Med-PaLM 2 answers to adversarial queries. Through our empirical study, we find that the use of a collection of datasets curated through a variety of methodologies, coupled with a thorough evaluation protocol that leverages multiple assessment rubric designs and diverse rater groups, surfaces biases that may be missed via narrower evaluation approaches. Our experience underscores the importance of using diverse assessment methodologies and involving raters of varying backgrounds and expertise. We emphasize that while our framework can identify specific forms of bias, it is not sufficient to holistically assess whether the deployment of an AI system promotes equitable health outcomes. We hope the broader community leverages and builds on these tools and methods towards realizing a shared goal of LLMs that promote accessible and equitable healthcare for all. View details
    Towards Conversational Diagnostic AI
    Anil Palepu
    Mike Schaekermann
    Khaled Saab
    Jan Freyberg
    Ryutaro Tanno
    Amy Wang
    Brenna Li
    Nenad Tomašev
    Karan Singhal
    Le Hou
    Albert Webson
    Kavita Kulkarni
    Sara Mahdavi
    Juro Gottweis
    Joelle Barral
    Kat Chou
    Arxiv (2024) (to appear)
    Preview abstract At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue. AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE's performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI. View details
    Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study
    Mike Schaekermann
    Terry Spitz
    Malcolm Chelliah
    Heather Cole-Lewis
    Stephanie Farquhar
    Qinghan Xue
    Jenna Lester
    Cían Hughes
    Patricia Strachan
    Fraser Tan
    Peggy Bui
    Craig Mermel
    Lily Peng
    Sunny Virmani
    Ivor Horn
    Cameron Chen
    The Lancet eClinicalMedicine (2024)
    Preview abstract Background Artificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed to develop a framework to quantitatively assess the performance equity of health AI technologies and to illustrate its utility via a case study. Methods Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., “R”) was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case. Findings Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs. Interpretation Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to pre-existing health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes. View details
    Towards Generalist Biomedical AI
    Danny Driess
    Mike Schaekermann
    Andrew Carroll
    Chuck Lau
    Ryutaro Tanno
    Ira Ktena
    Anil Palepu
    Basil Mustafa
    Aakanksha Chowdhery
    Simon Kornblith
    Philip Mansfield
    Sushant Prakash
    Renee Wong
    Sunny Virmani
    Sara Mahdavi
    Bradley Green
    Ewa Dominowska
    Joelle Barral
    Karan Singhal
    Pete Florence
    NEJM AI (2024)
    Preview abstract BACKGROUND: Medicine is inherently multimodal, requiring the simultaneous interpretation and integration of insights between many data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence systems that flexibly encode, integrate, and interpret these data might better enable impactful applications ranging from scientific discovery to care delivery. METHODS: To catalyze development of these models, we curated MultiMedBench, a new multimodal biomedical benchmark. MultiMedBench encompasses 14 diverse tasks, such as medical question answering, mammography and dermatology image interpretation, radiology report generation and summarization, and genomic variant calling. We then introduced Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same set of model weights. To further probe the capabilities and limitations of Med-PaLM M, we conducted a radiologist evaluation of model-generated (and human) chest x-ray reports. RESULTS: We observed encouraging performance across model scales. Med-PaLM M reached performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin. In a side-by-side ranking on 246 retrospective chest x-rays, clinicians expressed a pairwise preference for Med-PaLM Multimodal reports over those produced by radiologists in up to 40.50% of cases, suggesting potential clinical utility. CONCLUSIONS: Although considerable work is needed to validate these models in real-world cases and understand if cross-modality generalization is possible, our results represent a milestone toward the development of generalist biomedical artificial intelligence systems. View details
    Differences between Patient and Clinician Submitted Images: Implications for Virtual Care of Skin Conditions
    Rajeev Rikhye
    Grace Eunhae Hong
    Margaret Ann Smith
    Aaron Loh
    Vijaytha Muralidharan
    Doris Wong
    Michelle Phung
    Nicolas Betancourt
    Bradley Fong
    Rachna Sahasrabudhe
    Khoban Nasim
    Alec Eschholz
    Kat Chou
    Peggy Bui
    Justin Ko
    Steven Lin
    Mayo Clinic Proceedings: Digital Health (2024)
    Preview abstract Objective: To understand and highlight the differences in clinical, demographic, and image quality characteristics between patient-taken (PAT) and clinic-taken (CLIN) photographs of skin conditions. Patients and Methods: This retrospective study applied logistic regression to data from 2500 deidentified cases in Stanford Health Care’s eConsult system, from November 2015 to January 2021. Cases with undiagnosable or multiple conditions or cases with both patient and clinician image sources were excluded, leaving 628 PAT cases and 1719 CLIN cases. Demographic characteristic factors, such as age and sex were self-reported, whereas anatomic location, estimated skin type, clinical signs and symptoms, condition duration, and condition frequency were summarized from patient health records. Image quality variables such as blur, lighting issues and whether the image contained skin, hair, or nails were estimated through a deep learning model. Results: Factors that were positively associated with CLIN photographs, post-2020 were as follows: age 60 years or older, darker skin types (eFST V/VI), and presence of skin growths. By contrast, factors that were positively associated with PAT photographs include conditions appearing intermittently, cases with blurry photographs, photographs with substantial nonskin (or nail/hair) regions and cases with more than 3 photographs. Within the PAT cohort, older age was associated with blurry photographs. Conclusion: There are various demographic, clinical, and image quality characteristic differences between PAT and CLIN photographs of skin concerns. The demographic characteristic differences present important considerations for improving digital literacy or access, whereas the image quality differences point to the need for improved patient education and better image capture workflows, particularly among elderly patients. View details
    ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders
    Shawn Xu
    Lin Yang
    Timo Kohlberger
    Martin Ma
    Atilla Kiraly
    Sahar Kazemzadeh
    Zakkai Melamed
    Jungyeon Park
    Patricia MacWilliams
    Chuck Lau
    Christina Chen
    Mozziyar Etemadi
    Sreenivasa Raju Kalidindi
    Kat Chou
    Shravya Shetty
    Daniel Golden
    Rory Pilgrim
    Krish Eswaran
    arxiv (2023)
    Preview abstract Our approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI. View details
    Preview abstract Task-specific deep learning models in histopathology offer promising opportunities for improving diagnosis, clinical research, and precision medicine. However, development of such models is often limited by availability of high-quality data. Foundation models in histopathology that learn general representations across a wide range of tissue types, diagnoses, and magnifications offer the potential to reduce the data, compute, and technical expertise necessary to develop task-specific deep learning models with the required level of model performance. In this work, we describe the development and evaluation of foundation models for histopathology via self-supervised learning (SSL). We first establish a diverse set of benchmark tasks involving 17 unique tissue types and 12 unique cancer types and spanning different optimal magnifications and task types. Next, we use this benchmark to explore and evaluate histopathology-specific SSL methods followed by further evaluation on held out patch-level and weakly supervised tasks. We found that standard SSL methods thoughtfully applied to histopathology images are performant across our benchmark tasks and that domain-specific methodological improvements can further increase performance. Our findings reinforce the value of using domain-specific SSL methods in pathology, and establish a set of high quality foundation models to enable further research across diverse applications. View details
    Preview abstract The application of an artificial intelligence (AI)-based screening tool for retinal disease in India and Thailand highlighted the myths and reality of introducing medical AI, which may form a framework for subsequent tools. View details
    Caravan - A global community dataset for large-sample hydrology
    Frederik Kratzert
    Nans Addor
    Tyler Erickson
    Martin Gauch
    Lukas Gudmundsson
    Daniel Klotz
    Sella Nevo
    Guy Shalev
    Scientific Data, vol. 10 (2023), pp. 61
    Preview abstract High-quality datasets are essential to support hydrological science and modeling. Several CAMELS (Catchment Attributes and Meteorology for Large-sample Studies) datasets exist for specific countries or regions, however these datasets lack standardization, which makes global studies difficult. This paper introduces a dataset called Caravan (a series of CAMELS) that standardizes and aggregates seven existing large-sample hydrology datasets. Caravan includes meteorological forcing data, streamflow data, and static catchment attributes (e.g., geophysical, sociological, climatological) for 6830 catchments. Most importantly, Caravan is both a dataset and open-source software that allows members of the hydrology community to extend the dataset to new locations by extracting forcing data and catchment attributes in the cloud. Our vision is for Caravan to democratize the creation and use of globally-standardized large-sample hydrology datasets. Caravan is a truly global open-source community resource. View details
    A Neural Encoder for Earthquake Rate Forecasting
    Oleg Zlydenko
    Brendan Meade
    Alexandra Sharon Molchanov
    Sella Nevo
    Yohai bar Sinai
    Scientific Reports (2023)
    Preview abstract Forecasting the timing of earthquakes is a long-standing challenge. Moreover, it is still debated how to formulate this problem in a useful manner, or to compare the predictive power of different models. Here, we develop a versatile neural encoder of earthquake catalogs, and apply it to the fundamental problem of earthquake rate prediction, in the spatio-temporal point process framework. The epidemic type aftershock sequence model (ETAS) effectively learns a small number of parameters to constrain assumed functional forms for the space and time relationships of earthquake sequences (e.g., Omori-Utsu law). Here we introduce learned spatial and temporal embeddings for point process earthquake forecast models that capture complex correlation structures. We demonstrate the generality of this neural representation as compared with ETAS model using train-test data splits and how it enables the incorporation of additional geophysical information. In rate prediction tasks, the generalized model shows > 4% improvement in information gain per earthquake and the simultaneous learning of anisotropic spatial structures analogous to fault traces. The trained network can be also used to perform short-term prediction tasks, showing similar improvement while providing a 1,000-fold reduction in run-time. View details
    UniTune: Text-Driven Image Editing by Fine Tuning a Diffusion Model on a Single Image
    Dani Valevski
    Eyal Molad
    Eyal Segalis
    Yaniv Leviathan
    ACM Transactions on Graphics (2023)
    Preview abstract Text-driven image generation methods have shown impressive results recently, allowing casual users to generate high quality images by providing textual descriptions. However, similar capabilities for editing existing images are still out of reach. Text-driven image editing methods usually need edit masks, struggle with edits that require significant visual changes and cannot easily keep specific details of the edited portion. In this paper we make the observation that image-generation models can be converted to image-editing models simply by fine-tuning them on a single image. We also show that initializing the stochastic sampler with a noised version of the base image before the sampling and interpolating relevant details from the base image after sampling further increase the quality of the edit operation. Combining these observations, we propose UniTune, a novel image editing method. UniTune gets as input an arbitrary image and a textual edit description, and carries out the edit while maintaining high fidelity to the input image. UniTune does not require additional inputs, like masks or sketches, and can perform multiple edits on the same image without retraining. We test our method using the Imagen model in a range of different use cases. We demonstrate that it is broadly applicable and can perform a surprisingly wide range of expressive editing operations, including those requiring significant visual changes that were previously impossible. View details
    Preview abstract Inference from large autoregressive models like Transformers is slow - decoding K tokens takes K serial runs of the model. In this work we introduce speculative decoding - an algorithm to sample from autoregressive models faster without any changes to the outputs, by computing several tokens in parallel. At the heart of our approach lie the observations that (1) hard language-modeling tasks often include easier subtasks that can be approximated well by more efficient models, and (2) using speculative execution and a novel sampling method, we can make exact decoding from the large models faster, by running them in parallel on the outputs of the approximation models, potentially generating several tokens concurrently, and without changing the distribution. Our method can accelerate existing off-the-shelf models without retraining or architecture changes. We demonstrate it on T5-XXL and show a 2X-3X acceleration compared to the standard T5X implementation, with identical outputs. View details
    Preview abstract Health-related acoustic signals, such as cough and breathing sounds, are relevant for medical diagnosis and continuous health monitoring. Most existing machine learning approaches for health acoustics are trained and evaluated on specific tasks, limiting their generalizability across various healthcare applications. In this paper, we leverage a self-supervised learning framework, SimCLR with a Slowfast NFNet backbone, for contrastive learning of health acoustics. A crucial aspect of optimizing Slowfast NFNets for this application lies in identifying effective audio augmentations. We conduct an in-depth analysis of various audio augmentation strategies and demonstrate that an appropriate augmentation strategy enhances the performance of the Slowfast NFNet audio encoder across a diverse set of health acoustic tasks. Our findings reveal that when augmentations are combined, they can produce synergistic effects that exceed the benefits seen when each is applied individually. View details
    AI Increases Global Access to Reliable Flood Forecasts
    Asher Metzger
    Dana Weitzner
    Frederik Kratzert
    Guy Shalev
    Martin Gauch
    Sella Nevo
    Shlomo Shenzis
    Tadele Yednkachw Tekalign
    Vusumuzi Dube
    arXiv (2023)
    Preview abstract Floods are one of the most common natural disasters, with a disproportionate impact in developing countries that often lack dense streamflow gauge networks. Accurate and timely warnings are critical for mitigating flood risks, but hydrological simulation models typically must be calibrated to long data records in each watershed. Here we show that AI-based forecasting achieves reliability in predicting extreme riverine events in ungauged watersheds at up to a 5-day lead time that is similar to or better than the reliability of nowcasts (0-day lead time) from a current state of the art global modeling system (the Copernicus Emergency Management Service Global Flood Awareness System). Additionally, we achieve accuracies over 5-year return period events that are similar to or better than current accuracies over 1-year return period events. This means that AI can provide flood warnings earlier and over larger and more impactful events in ungauged basins. The model developed in this paper was incorporated into an operational early warning system that produces publicly available (free and open) forecasts in real time in over 80 countries. This work highlights a need for increasing the availability of hydrological data to continue to improve global access to reliable flood warnings. View details
    Face0: Instantaneously Conditioning a Text-to-Image Model on a Face
    Dani Valevski
    Danny Wasserman
    Yaniv Leviathan
    SIGGRAPH Asia 2023 Conference Papers (2023)
    Preview abstract We present Face0, a novel way to instantaneously condition a text-to-image generation model on a face, in sample time, without any optimization procedures such as fine-tuning or inversions. We augment a dataset of annotated images with embeddings of the included faces and train an image generation model (we use Stable Diffusion) on the augmented dataset. Once trained, our system is practically identical at inference time to the underlying base model, and is therefore able to generate images, given a user-supplied face image, unknown at training time, and a prompt, in just a couple of seconds. While other methods, especially those that receive multiple user supplied images, suffer less from identity loss, our method still achieves pleasing results, is remarkably simple, extremely fast, and equips the underlying model with new capabilities, like controlling the generated images both via text or via direct manipulation of the input face embeddings. In addition, when using random vectors instead of face embeddings from a user supplied image, our method essentially solves the problem of consistent character generation across images. Finally, while requiring further research, we hope that our method, which decouples the model’s textual biases from its biases on faces, might be a step towards some mitigation of biases in future text-to-image models. View details
    Towards Physician-Level Medical Question Answering with Large Language Models
    Karan Singhal
    Juro Gottweis
    Le Hou
    Kevin Clark
    Heather Cole-Lewis
    Mike Schaekermann
    Amy Wang
    Sami Lachgar
    Philip Mansfield
    Sushant Prakash
    Bradley Green
    Ewa Dominowska
    Nenad Tomašev
    Renee Wong
    Sara Mahdavi
    Joelle Barral
    Arxiv (2023) (to appear)
    Preview abstract Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability to retrieve medical knowledge, reason over it, and answer medical questions comparably to physicians has long been viewed as one such grand challenge. Large language models (LLMs) have catalyzed significant progress in medical question answering; Med-PaLM was the first model to exceed a "passing" score in US Medical Licensing Examination (USMLE) style questions with a score of 67.2% on the MedQA dataset. However, this and other prior work suggested significant room for improvement, especially when models' answers were compared to clinicians' answers. Here we present Med-PaLM 2, which bridges these gaps by leveraging a combination of base LLM improvements (PaLM 2), medical domain finetuning, and prompting strategies including a novel ensemble refinement approach. Med-PaLM 2 scored up to 86.5% on the MedQA dataset, improving upon Med-PaLM by over 19% and setting a new state-of-the-art. We also observed performance approaching or exceeding state-of-the-art across MedMCQA, PubMedQA, and MMLU clinical topics datasets. We performed detailed human evaluations on long-form questions along multiple axes relevant to clinical applications. In pairwise comparative ranking of 1066 consumer medical questions, physicians preferred Med-PaLM 2 answers to those produced by physicians on eight of nine axes pertaining to clinical utility (p < 0.001). We also observed significant improvements compared to Med-PaLM on every evaluation axis (p < 0.001) on newly introduced datasets of 240 long-form "adversarial" questions to probe LLM limitations. While further studies are necessary to validate the efficacy of these models in real-world settings, these results highlight rapid progress towards physician-level performance in medical question answering. View details
    Discovering novel systemic biomarkers in external eye photos
    Ilana Traynis
    Christina Chen
    Akib Uddin
    Jorge Cuadros
    Lauren P. Daskivich
    April Y. Maa
    Ramasamy Kim
    Eugene Yu-Chuan Kang
    Lily Peng
    Avinash Varadarajan
    The Lancet Digital Health (2023)
    Preview abstract Background Photographs of the external eye were recently shown to reveal signs of diabetic retinal disease and elevated glycated haemoglobin. This study aimed to test the hypothesis that external eye photographs contain information about additional systemic medical conditions. Methods We developed a deep learning system (DLS) that takes external eye photographs as input and predicts systemic parameters, such as those related to the liver (albumin, aspartate aminotransferase [AST]); kidney (estimated glomerular filtration rate [eGFR], urine albumin-to-creatinine ratio [ACR]); bone or mineral (calcium); thyroid (thyroid stimulating hormone); and blood (haemoglobin, white blood cells [WBC], platelets). This DLS was trained using 123 130 images from 38 398 patients with diabetes undergoing diabetic eye screening in 11 sites across Los Angeles county, CA, USA. Evaluation focused on nine prespecified systemic parameters and leveraged three validation sets (A, B, C) spanning 25 510 patients with and without diabetes undergoing eye screening in three independent sites in Los Angeles county, CA, and the greater Atlanta area, GA, USA. We compared performance against baseline models incorporating available clinicodemographic variables (eg, age, sex, race and ethnicity, years with diabetes). Findings Relative to the baseline, the DLS achieved statistically significant superior performance at detecting AST >36.0 U/L, calcium <8.6 mg/dL, eGFR <60.0 mL/min/1.73 m2, haemoglobin <11.0 g/dL, platelets <150.0 × 103/μL, ACR ≥300 mg/g, and WBC <4.0 × 103/μL on validation set A (a population resembling the development datasets), with the area under the receiver operating characteristic curve (AUC) of the DLS exceeding that of the baseline by 5.3–19.9% (absolute differences in AUC). On validation sets B and C, with substantial patient population differences compared with the development datasets, the DLS outperformed the baseline for ACR ≥300.0 mg/g and haemoglobin <11.0 g/dL by 7.3–13.2%. Interpretation We found further evidence that external eye photographs contain biomarkers spanning multiple organ systems. Such biomarkers could enable accessible and non-invasive screening of disease. Further work is needed to understand the translational implications. View details
    A full-STAC remedy for global digital health transformation: open standards, technologies, architectures and content
    Garrett Mehl
    Martin Seneviratne
    Matt L. Berg
    Suhel Bidani
    Rebecca L. Distler
    Marelize Gorgens
    Karin Kallander
    Alain B. Labrique
    Mark S. Landry
    Carl Leitner
    Peter Lubell-Doughtie
    Alvin D Marcelo
    Von Nguyen
    Jennifer Nelson
    Jean Philbert Nsengimana
    Maeghan Orton
    Daniel R Otzoy Garcia
    Daniel R Oyaole
    Natschja Ratanaprayul
    Susann Roth
    Merrick P Schaefer
    Dykki Settle
    Jing Tang
    Barakissa Tien-Wahser
    Steven Wanyee
    Fred Hersch
    Oxford Open Digital Health (2023)
    Preview abstract The global digital health ecosystem is project-centric: point solutions are developed for vertical health programs and financed through vertical funding allocations. This results in data fragmentation and technology lock-in, compromising health care delivery. A convergence of trends enabled by interoperability and digital governance makes possible a shift towards person-focused health. Together, open Standards, open Technologies, open Architectures and open Content represent a next-generation ‘full-STAC’ remedy for digital health transformation. Local developers and implementers can avoid reinventing the wheel, and instead build digital tools suited to local needs—where data travels with an individual over time, evidence-based practice is easily integrated, and insights are gleaned from harmonized data. This is the culmination of the vision endorsed by 194 WHO Member States in the Global Strategy on Digital Health 2020 to 2025. View details
    Consensus, dissensus and synergy between clinicians and specialist foundation models in radiology report generation
    Ryutaro Tanno
    David Barrett
    Sumedh Ghaisas
    Sumanth Dathathri
    Abi See
    Johannes Welbl
    Karan Singhal
    Mike Schaekermann
    Rhys May
    Roy Lee
    SiWai Man
    Zahra Ahmed
    Sara Mahdavi
    Joelle Barral
    Ali Eslami
    Danielle Belgrave
    Shravya Shetty
    Po-Sen Huang
    Ira Ktena
    Arxiv (2023)
    Preview abstract Radiology reports are an instrumental part of modern medicine, informing key clinical decisions such as diagnosis and treatment. The worldwide shortage of radiologists, however, restricts access to expert care and imposes heavy workloads, contributing to avoidable errors and delays in report delivery. While recent progress in automated report generation with vision-language models offer clear potential in ameliorating the situation, the path to real-world adoption has been stymied by the challenge of evaluating the clinical quality of AI-generated reports. In this study, we build a state-of-the-art report generation system for chest radiographs, Flamingo-CXR, by fine-tuning a well-known vision-language foundation model on radiology data. To evaluate the quality of the AI-generated reports, a group of 16 certified radiologists provide detailed evaluations of AI-generated and human written reports for chest X-rays from an intensive care setting in the United States and an inpatient setting in India. At least one radiologist (out of two per case) preferred the AI report to the ground truth report in over 60% of cases for both datasets. Amongst the subset of AI-generated reports that contain errors, the most frequently cited reasons were related to the location and finding, whereas for human written reports, most mistakes were related to severity and finding. This disparity suggested potential complementarity between our AI system and human experts, prompting us to develop an assistive scenario in which Flamingo-CXR generates a first-draft report, which is subsequently revised by a clinician. This is the first demonstration of clinician-AI collaboration for report writing, and the resultant reports are assessed to be equivalent or preferred by at least one radiologist to reports written by experts alone in 80% of in-patient cases and 60% of intensive care cases. View details
    Towards Accurate Differential Diagnosis with Large Language Models
    Daniel McDuff
    Mike Schaekermann
    Anil Palepu
    Amy Wang
    Karan Singhal
    Yash Sharma
    Kavita Kulkarni
    Le Hou
    Sara Mahdavi
    Sushant Prakash
    Anupam Pathak
    Shwetak Patel
    Ewa Dominowska
    Juro Gottweis
    Joelle Barral
    Kat Chou
    Jake Sunshine
    Arxiv (2023)
    Preview abstract An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM optimized for diagnostic reasoning, and evaluate its ability to generate a DDx alone or as an aid to clinicians. 20 clinicians evaluated 302 challenging, real-world medical cases sourced from the New England Journal of Medicine (NEJM) case reports. Each case report was read by two clinicians, who were randomized to one of two assistive conditions: either assistance from search engines and standard medical resources, or LLM assistance in addition to these tools. All clinicians provided a baseline, unassisted DDx prior to using the respective assistive tools. Our LLM for DDx exhibited standalone performance that exceeded that of unassisted clinicians (top-10 accuracy 59.1% vs 33.6%, [p = 0.04]). Comparing the two assisted study arms, the DDx quality score was higher for clinicians assisted by our LLM (top-10 accuracy 51.7%) compared to clinicians without its assistance (36.1%) (McNemar's Test: 45.7, p < 0.01) and clinicians with search (44.4%) (4.75, p = 0.03). Further, clinicians assisted by our LLM arrived at more comprehensive differential lists than those without its assistance. Our study suggests that our LLM for DDx has potential to improve clinicians' diagnostic reasoning and accuracy in challenging cases, meriting further real-world evaluation for its ability to empower physicians and widen patients' access to specialist-level expertise. View details
    Large Language Models Encode Clinical Knowledge
    Karan Singhal
    Sara Mahdavi
    Jason Wei
    Hyung Won Chung
    Nathan Scales
    Ajay Tanwani
    Heather Cole-Lewis
    Perry Payne
    Martin Seneviratne
    Paul Gamble
    Abubakr Abdelrazig Hassan Babiker
    Nathanael Schaerli
    Aakanksha Chowdhery
    Philip Mansfield
    Dina Demner-Fushman
    Katherine Chou
    Juraj Gottweis
    Nenad Tomašev
    Alvin Rajkomar
    Joelle Barral
    Nature (2023)
    Preview abstract Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of models typically rely on automated evaluations based on limited benchmarks. Here, to address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries and a new dataset of medical questions searched online, HealthSearchQA. We propose a human evaluation framework for model answers along multiple axes including factuality, comprehension, reasoning, possible harm and bias. In addition, we evaluate Pathways Language Model (PaLM, a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA and Measuring Massive Multitask Language Understanding (MMLU) clinical topics), including 67.6% accuracy on MedQA (US Medical Licensing Exam-style questions), surpassing the prior state of the art by more than 17%. However, human evaluation reveals key gaps. To resolve this, we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, knowledge recall and reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal limitations of today’s models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLMs for clinical applications. View details
    Shared computational principles for language processing in humans and deep language models
    Ariel Goldstein
    Zaid Zada
    Eliav Buchnik
    Amy Price
    Bobbi Aubrey
    Samuel A. Nastase
    Harshvardhan Gazula
    Gina Choe
    Aditi Rao
    Catherine Kim
    Colton Casto
    Lora Fanda
    Werner Doyle
    Daniel Friedman
    Patricia Dugan
    Lucia Melloni
    Roi Reichart
    Sasha Devore
    Adeen Flinker
    Liat Hasenfratz
    Omer Levy,
    Kenneth A. Norman
    Orrin Devinsky
    Uri Hasson
    Nature Neuroscience (2022)
    Preview abstract Departing from traditional linguistic models, advances in deep learning have resulted in a new type of predictive (autoregressive) deep language models (DLMs). Using a self-supervised next-word prediction task, these models generate appropriate linguistic responses in a given context. In the current study, nine participants listened to a 30-min podcast while their brain responses were recorded using electrocorticography (ECoG). We provide empirical evidence that the human brain and autoregressive DLMs share three fundamental computational principles as they process the same natural narrative: (1) both are engaged in continuous next-word prediction before word onset; (2) both match their pre-onset predictions to the incoming word to calculate post-onset surprise; (3) both rely on contextual embeddings to represent words in natural contexts. Together, our findings suggest that autoregressive DLMs provide a new and biologically feasible computational framework for studying the neural basis of language. View details
    TRUE: Re-evaluating Factual Consistency Evaluation
    Or Honovich
    Hagai Taitelbaum
    Vered Cohen
    Thomas Scialom
    NAACL 2022, The Association for Computational Linguistics (2022)
    Preview abstract Grounded text generation systems often generate text that contains factual inconsistencies, hindering their real-world applicability. Automatic factual consistency evaluation may help alleviate this limitation by accelerating evaluation cycles, filtering inconsistent outputs and augmenting training data. While attracting increasing attention, such evaluation metrics are usually developed and evaluated in silo for a single task or dataset, slowing their adoption. Moreover, previous meta-evaluation protocols focused on system-level correlations with human annotations, which leave the example-level accuracy of such metrics unclear. In this work, we introduce TRUE: a comprehensive study of factual consistency metrics on a standardized collection of existing texts from diverse tasks, manually annotated for factual consistency. Our standardization enables an example-level meta-evaluation protocol that is more actionable and interpretable than previously reported correlations, yielding clearer quality measures. Across diverse state-of-the-art metrics and 11 datasets we find that large-scale NLI and question generation-and-answering-based approaches achieve strong and complementary results. We recommend those methods as a starting point for model and metric developers, and hope TRUE will foster progress towards even better methods. View details
    Preview abstract Physicians record their detailed thought-processes about diagnoses and treatments as unstructured text in a section of a clinical note called the assessment and plan. This information is more clinically rich than structured billing codes assigned for an encounter but harder to reliably extract given the complexity of clinical language and documentation habits. We describe and release a dataset containing annotations of 579 admission and progress notes from the publicly available and de-identified MIMIC-III ICU dataset with over 30,000 labels identifying active problems, their assessment, and the category of associated action items (e.g. medication, lab test). We also propose deep-learning based models that approach human performance, with a F1 score of 0.88. We found that by employing weak supervision and domain specific data-augmentation, we could improve generalization across departments and reduce the number of human labeled notes without sacrificing performance. View details
    Discovering novel systemic biomarkers in photos of the external eye
    Boris Babenko
    Ilana Traynis
    Christina Chen
    Preeti Singh
    Akib Uddin
    Jorge Cuadros
    Lauren P. Daskivich
    April Y. Maa
    Kim Ramasamy
    Eugene Yu-Chuan Kang
    Gregory S. Corrado
    Lily Peng
    Dale R. Webster
    Christopher Semturs
    Jonathan Krause
    Avinash V. Varadarajan
    Naama Hammel
    Yun Liu:
    (2022)
    Preview abstract External eye photos were recently shown to reveal signs of diabetic retinal disease and elevated HbA1c. In this paper, we evaluate if external eye photos contain information about additional systemic medical conditions. We developed a deep learning system (DLS) that takes external eye photos as input and predicts multiple systemic parameters, such as those related to the liver (albumin, AST); kidney (eGFR estimated using the race-free 2021 CKD-EPI creatinine equation, the urine ACR); bone & mineral (calcium); thyroid (TSH); and blood count (Hgb, WBC, platelets). Development leveraged 151,237 images from 49,015 patients with diabetes undergoing diabetic eye screening in 11 sites across Los Angeles county, CA. Evaluation focused on 9 pre-specified systemic parameters and leveraged 3 validation sets (A, B, C) spanning 28,869 patients with and without diabetes undergoing eye screening in 3 independent sites in Los Angeles County, CA, and the greater Atlanta area, GA. We compared against baseline models incorporating available clinicodemographic variables (e.g. age, sex, race/ethnicity, years with diabetes). Relative to the baseline, the DLS achieved statistically significant superior performance at detecting AST>36, calcium<8.6, eGFR<60, Hgb<11, platelets<150, ACR>=300, and WBC<4 on validation set A (a patient population similar to the development sets), where the AUC of DLS exceeded that of the baseline by 5.2-19.4%. On validation sets B and C, with substantial patient population differences compared to the development sets, the DLS outperformed the baseline for ACR>=300 and Hgb<11 by 7.3-13.2%. Our findings provide further evidence that external eye photos contain important biomarkers of systemic health spanning multiple organ systems. Further work is needed to investigate whether and how these biomarkers can be translated into clinical impact. View details
    Preview abstract AI models have shown promise in performing many medical imaging tasks. However, our ability to explain what signals these models learn from the training data is severely lacking. Explanations are needed in order to increase the trust of doctors in AI-based models, especially in domains where AI prediction capabilities surpass those of humans. Moreover, such explanations could enable novel scientific discovery by uncovering signals in the data that aren’t yet known to experts. In this paper, we present a method for automatic visual explanations that can help achieve these goals by generating hypotheses of what visual signals in the images are correlated with the task. We propose the following 4 steps: (i) Train a classifier to perform a given task to assess whether the imagery indeed contains signals relevant to the task; (ii) Train a StyleGAN-based image generator with an architecture that enables guidance by the classifier (“StylEx”); (iii) Automatically detect and extract the top visual attributes that the classifier is sensitive to. Each of these attributes can then be independently modified for a set of images to generate counterfactual visualizations of those attributes (i.e. what that image would look like with the attribute increased or decreased); (iv) Present the discovered attributes and corresponding counterfactual visualizations to a multidisciplinary panel of experts to formulate hypotheses for the underlying mechanisms with consideration to social and structural determinants of health (e.g. whether the attributes correspond to known patho-physiological or socio-cultural phenomena, or could be novel discoveries) and stimulate future research. To demonstrate the broad applicability of our approach, we demonstrate results on eight prediction tasks across three medical imaging modalities – retinal fundus photographs, external eye photographs, and chest radiographs. We showcase examples where many of the automatically-learned attributes clearly capture clinically known features (e.g., types of cataract, enlarged heart), and demonstrate automatically-learned confounders that arise from factors beyond physiological mechanisms (e.g., chest X-ray underexposure is correlated with the classifier predicting abnormality, and eye makeup is correlated with the classifier predicting low hemoglobin levels). We further show that our method reveals a number of physiologically plausible novel attributes for future investigation (e.g., differences in the fundus associated with self-reported sex, which were previously unknown). While our approach is not able to discern causal pathways, the ability to generate hypotheses from the attribute visualizations has the potential to enable researchers to better understand, improve their assessment, and extract new knowledge from AI-based models. Importantly, we highlight that attributes generated by our framework can capture phenomena beyond physiology or pathophysiology, reflecting the real world nature of healthcare delivery and socio-cultural factors, and hence multidisciplinary perspectives are critical in these investigations. Finally, we release code to enable researchers to train their own StylEx models and analyze their predictive tasks of interest, and use the methodology presented in this paper for responsible interpretation of the revealed attributes. View details
    Eddystone-EID: Secure and Private Infrastructural Protocol for BLE Beacons
    Liron David
    Alon Ziv
    IEEE Transactions on Information Forensics and Security (2022)
    Preview abstract Beacons are small devices which are playing an important role in the Internet of Things (IoT), connecting “things” without IP connection to the Internet via Bluetooth Low Energy (BLE) communication. In this paper we present the first private end-to-end encryption protocol called the Eddystone-Ephemeral-ID (Eddystone-EID) protocol. This protocol enables connectivity from any beacon to its remote owner, while supporting beacon’s privacy and security, and essentially preserving the beacon’s low power consumption. We describe the Eddystone-EID development goals, discuss the design decisions, show the cryptographic solution, and analyse its privacy, security, and performance. Finally, we present three secure IoT applications built on Eddystone-EID, demonstrating its utility as a security and privacy infrastructure in the IoT domain. Further, Eddystone-EID is a prototypical example of security design for an asymmetric system in which on one side there are small power-deficient elements (the beacons) and on the other side there is a powerful computing engine (a cloud). The crux of the design strategy is based on: (1) transferring work from the beacon to the cloud, and then (2) building a trade-off between cloud online work against cloud offline work, in order to enable fast real-time reaction of the cloud. These two principles seem to be generic and can be used for other problems in the IoT domain. View details
    Preview abstract A streaming algorithm is said to be adversarially robust if its accuracy guarantees are maintained even when the data stream is chosen maliciously, by an adaptive adversary. We establish a connection between adversarial robustness of streaming algorithms and the notion of differential privacy. This connection allows us to design new adversarially robust streaming algorithms that outperform the current state-of-the-art constructions for many interesting regimes of parameters. View details
    Preview abstract We present efficient differentially private algorithms for learning unions of polygons in the plane (which are not necessarily convex). Our algorithms are $(\alpha,\beta)$--probably approximately correct and $(\varepsilon,\delta)$--differentially private using a sample of size $\tilde{O}\left(\frac{1}{\alpha\varepsilon}k\log d\right)$, where the domain is $[d]\times[d]$ and $k$ is the number of edges in the union of polygons. Our algorithms are obtained by designing a private variant of the classical (nonprivate) learner for conjunctions using the greedy algorithm for set cover. View details
    Artificial intelligence for phase recognition in complex laparoscopic cholecystectomy
    Tomer Golany
    Amit Aides
    Nadav Avraham Rabani
    Wisam Khoury
    Hanoch Kashtan
    Petachia Reissman
    Surgical Endoscopy (2022)
    Preview abstract Background: The potential role and benefits of AI in surgery has yet to be determined. This study is a first step in developing an AI system for minimizing adverse events and improving patient’s safety. We developed an Artificial Intelligence (AI) algorithm and evaluated its performance in recognizing surgical phases of laparoscopic cholecystectomy (LC) videos spanning a range of complexities. Methods: A set of 371 LC videos with various complexity levels and containing adverse events was collected from five hospitals. Two expert surgeons segmented each video into 10 phases including Calot’s triangle dissection and clipping and cutting. For each video, adverse events were also annotated when present (major bleeding; gallbladder perforation; major bile leakage; and incidental finding) and complexity level (on a scale of 1–5) was also recorded. The dataset was then split in an 80:20 ratio (294 and 77 videos), stratified by complexity, hospital, and adverse events to train and test the AI model, respectively. The AI-surgeon agreement was then compared to the agreement between surgeons. Results: The mean accuracy of the AI model for surgical phase recognition was 89% [95% CI 87.1%, 90.6%], comparable to the mean inter-annotator agreement of 90% [95% CI 89.4%, 90.5%]. The model’s accuracy was inversely associated with procedure complexity, decreasing from 92% (complexity level 1) to 88% (complexity level 3) to 81% (complexity level 5). Conclusion: The AI model successfully identified surgical phases in both simple and complex LC procedures. Further validation and system training is warranted to evaluate its potential applications such as to increase patient safety during surgery. View details
    Preview abstract We present differentially private efficient algorithms for learning polygons in the plane (which are not necessarily convex). Our algorithm achieves $(\alpha,\beta)$-PAC learning and $(\eps,\delta)$-differential privacy using a sample of size $O\left(\frac{k}{\alpha\eps}\log\left(\frac{|X|}{\beta\delta}\right)\right)$, where the domain is $X\times X$ and $k$ is the number of edges in the (potentially non-convex) polygon. View details
    Preview abstract “Exposure Notification (EN) Systems” which have been envisioned by a number of academic and industry groups, are useful in aiding health authorities worldwide to fight the COVID-19 pandemic spread via contact tracing. Among these systems, many rely on the BLE based Google-Apple Exposure Notification (GAEN) API (for iPhones and Android systems). We assert that it is now the time to investigate how to deal with scale issues, assuming the next pandemic/ variant will be more extensive. To this end, we present two modular enhancements to scale up the GAEN API by improving performance and suggesting a better performance-privacy tradeoff. Our modifications have the advantage of affecting only the GAEN API modules and do not require any change to the systems built on top of it, therefore it can be easily adopted upon emerging needs. The techniques we suggest in this paper (called “dice and splice” and “forest from the PRF-tree”) are general and applicable to scenarios of searching values within anonymous pseudo-randomly generated sequences. View details
    Flood forecasting with machine learning models in an operational framework
    Asher Metzger
    Chen Barshai
    Dana Weitzner
    Frederik Kratzert
    Gregory Begelman
    Guy Shalev
    Hila Noga
    Moriah Royz
    Niv Giladi
    Ronnie Maor
    Sella Nevo
    Yotam Gigi
    HESS (2022)
    Preview abstract Google’s operational flood forecasting system was developed to provide accurate real-time flood warnings to agencies and the public, with a focus on riverine floods in large, gauged rivers. It became operational in 2018 and has since expanded geographically. This forecasting system consists of four subsystems: data validation, stage forecasting, inundation modeling, and alert distribution. Machine learning is used for two of the subsystems. Stage forecasting is modeled with the Long Short-Term Memory (LSTM) networks and the Linear models. Flood inundation is computed with the Thresholding and the Manifold models, where the former computes inundation extent and the latter computes both inundation extent and depth. The Manifold model, presented here for the first time, provides a machine-learning alternative to hydraulic modeling of flood inundation. When evaluated on historical data, all models achieve sufficiently high-performance metrics for operational use. The LSTM showed higher skills than the Linear model, while the Thresholding and Manifold models achieved similar performance metrics for modeling inundation extent. During the 2021 monsoon season, the flood warning system was operational in India and Bangladesh, covering flood-prone regions around rivers with a total area of 287,000 km2, home to more than 350M people. More than 100M flood alerts were sent to affected populations, to relevant authorities, and to emergency organizations. Current and future work on the system includes extending coverage to additional flood-prone locations, as well as improving modeling capabilities and accuracy. View details
    Adversarial Robustness of Streaming Algorithms through Importance Sampling
    Vladimir Braverman
    Sandeep Silwal
    Samson Zhou
    Advances in Neural Information Processing Systems 34 (NeurIPS 2021) (2021)
    Preview abstract Robustness against adversarial attacks have recently been at the forefront of algorithmic design for machine learning tasks. In the adversarial streaming model, an adversary gives an algorithm a sequence of adaptively chosen updates $u_1,\ldots,u_n$ as a data stream. The goal of the algorithm is to compute or approximate some predetermined function for every prefix of the adversarial stream, but the adversary may generate future updates based on previous outputs of the algorithm. In particular, the adversary may gradually learn the random bits internally used by an algorithm to manipulate dependencies in the input. This is especially problematic as many important problems in the streaming model require randomized algorithms, as they are known to not admit any deterministic algorithms that use sublinear space. In this paper, we introduce adversarially robust streaming algorithms for central machine learning and algorithmic tasks, such regression and clustering, as well as their more general counterparts, subspace embedding, low-rank approximation, and coreset construction. For regression and other numerical linear algebra related tasks, we consider the row arrival streaming model. Our results are based on a simple, but powerful, observation that sampling based algorithms give rise to adversarial robustness which is in contrast to sketching based algorithms, which are very prevalent in the streaming literature but suffer from adversarial attacks. In addition, we show that the well-known merge and reduce paradigm in streaming is adversarially robust. Since the merge and reduce paradigm defines coreset constructions, we thus obtain robust algorithms for $k$-means, $k$-median, $k$-center, Bregman clustering, projective clustering, principal component analysis (PCA) and non-negative matrix factorization. To the best of our knowledge, these are the first adversarially robust methods for these problems. View details
    Detection of Elusive Polyps via a Large Scale AI System
    Dan Livovsky
    Danny Veikherman
    Tomer Golany
    Amit Aides
    Valentin Dashinsky
    Nadav Rabani
    David Ben Shimol
    Yochai Blau
    Ilan Moshe Shimshoni
    Ori Segol
    Eran Goldin
    Jesse Lachter
    Gastrointestinal Endoscopy (2021)
    Preview abstract Colorectal cancer (CRC) is the second leading cause of cancer death worldwide resulting in an estimated 900,000 deaths per year. Colonoscopy is the gold standard for detection and removal of precancerous lesions, and has been amply shown to reduce mortality. However, the miss rate for polyps during colonoscopies is 22-28%, while 20-24% of the missed lesions are histologically confirmed adenomas. To address this shortcoming, we propose a polyp detection system based on deep learning, which can alert the operator in real-time to the presence and location of polyps during a colonoscopy. We dub the system DEEP^2: DEEP DEtection of ElusivePolyps. The DEEP^2 system was trained on 3,611 hours of colonoscopy videos derived from two sources, and was validated on a set comprising 1,393 hours of video, coming from a third, unrelated source. For the validation set, the ground truth labelling was provided by offline GI annotators, who were able to watch the video in slow-motion and pause/rewind as required; two or three such annotators examined each video. Overall, DEEP^2 achieves a sensitivity of 96.8% at 4.9 false alarms per video, which improves substantially on the current state of the art. These results are attained using a neural network architecture which is designed to provide fast computations, and can therefore run in real-time at greater than 30 frames per second. We further analyze the data by examining its performance on elusive polyps, those polyps which are particularly difficult for endoscopists to detect. First, we show that on fast polyps that are in the field of view for less than 5 seconds, DEEP^2 attains a sensitivity of 88.5%, compared to a sensitivity of 31.7% for the endoscopists performing the procedure. On even shorter duration polyps, those that are in the field of view for less than 2 seconds, the difference is even starker: DEEP^2 attains a sensitivity of 84.9% vs. 18.9% for the endoscopists. Second, we examine procedures which are apparently clean, in that no polyps are detected by either the performing endoscopist or the offline annotators. In these sequences, DEEP^2 is able to detect polyps -- not seen by either live endoscopists or offline annotators -- which were later verified to be real polyps: an average of 0.22 polyps per sequence, of which 0.10 are adenomas. Finally, a preliminary small clinical validation indicates that the system will be useful in practice: on 32 procedures, DEEP^2 discovered an average of 1.06 polyps per procedure that would have otherwise been missed by the GI performing the procedure. Future work will be needed to measure the clinical impact on a larger scale. View details
    Physics aware downsampling with deep learning for scalable flood modeling
    Niv Giladi
    Sella Nevo
    Daniel Soudry
    Advances in Neural Information Processing Systems 34 (NeurIPS 2021) (2021)
    Preview abstract Background: Floods are the most common natural disaster in the world, affecting the lives of hundreds of millions. Flood forecasting is therefore a vitally important endeavor, typically achieved using physical water flow simulations, which rely on accurate terrain elevation maps. However, such simulations, based on solving partial differential equations, are computationally prohibitive on a large scale. This scalability issue is commonly alleviated using a coarse grid representation of the elevation map, though this representation may distort crucial terrain details, leading to significant inaccuracies in the simulation. Contributions: We train a deep neural network to perform physics-informed downsampling of the terrain map: we optimize the coarse grid representation of the terrain maps, so that the flood prediction will match the fine grid solution. For the learning process to succeed, we configure a dataset specifically for this task. We demonstrate that with this method, it is possible to achieve a significant reduction in computational cost, while maintaining an accurate solution. A reference implementation accompanies the paper as well as documentation and code for dataset reproduction. View details
    Learning and Evaluating a Differentially Private Pre-trained Language Model
    Shlomo Hoory
    Avichai Tendler
    Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics, Punta Cana, Dominican Republic, pp. 1178-1189
    Preview abstract Contextual language models have led to significantly better results on a plethora of language understanding tasks, especially when pre-trained on the same data as the downstream task. While this additional pre-training usually improves performance, it often leads to information leakage and therefore risks the privacy of individuals mentioned in the training data. One method to guarantee the privacy of such individuals is to train a differentially private model, but this usually comes at the expense of model performance. Moreover, it is hard to tell given a privacy parameter $\epsilon$ what was the effect on the trained representation and whether it maintained relevant information while improving privacy. To improve privacy and guide future practitioners and researchers, we demonstrate here how to train a differentially private pre-trained language model (i.e., BERT) with a privacy guarantee of $\epsilon=0.5$ with only a small degradation in performance. We experiment on a dataset of clinical notes with a model trained on an entity extraction (EE) task on and compare it to a similar model trained without differential privacy. Finally, we present a series of experiments showing how to interpret the differentially private representation and understand the information lost and maintained in this process. View details
    Preview abstract We study conversational domain exploration (CODEX), where the user’s goal is to enrich her knowledge of a given domain by conversing with an informative bot. Such conversations should be well grounded in high-quality domain knowledge as well as engaging and open-ended. A CODEX bot should be proactive and introduce relevant information even if not directly asked for by the user. The bot should also appropriately pivot the conversation to undiscovered regions of the domain. To address these dialogue characteristics, we introduce a novel approach termed dynamic composition that decouples candidate content generation from the flexible composition of bot responses. This allows the bot to control the source, correctness and quality of the offered content, while achieving flexibility via a dialogue manager that selects the most appropriate contents in a compositional manner. We implemented a CODEX bot based on dynamic composition and integrated it into the Google Assistant. As an example domain, the bot conversed about the NBA basketball league in a seamless experience, such that users were not aware whether they were conversing with the vanilla system or the one augmented with our CODEX bot. Results are positive and offer insights into what makes for a good conversation. To the best of our knowledge, this is the first real user experiment of open-ended dialogues as part of a commercial assistant system. View details
    Detecting Deficient Coverage in Colonoscopies
    Amit Aides
    Ariel Gordon
    Danny Veikherman
    Ilan Moshe Shimshoni
    Tomer Golany
    Yochai Blau
    IEEE Transactions on Medical Imaging (2020)
    Preview abstract Colorectal Cancer (CRC) is a global health problem, resulting in 900K deaths per year. Colonoscopy is the tool of choice for preventing CRC, by detecting polyps before they become cancerous, and removing them. However, colonoscopy is hampered by the fact that endoscopists routinely miss an average of 22-28% of polyps. While some of these missed polyps appear in the endoscopist's field of view, others are missed simply because of substandard coverage of the procedure, i.e. not all of the colon is seen. This paper attempts to rectify the problem of substandard coverage in colonoscopy through the introduction of the C2D2 (Colonoscopy Coverage Deficiency via Depth) algorithm which detects deficient coverage, and can thereby alert the endoscopist to revisit a given area. More specifically, C2D2 consists of two separate algorithms: the first performs depth estimation of the colon given an ordinary RGB video stream; while the second computes coverage given these depth estimates. Rather than compute coverage for the entire colon, our algorithm computes coverage locally, on a segment-by-segment basis; C2D2 can then indicate in real-time whether a particular area of the colon has suffered from deficient coverage, and if so the endoscopist can return to that area. Our coverage algorithm is the first such algorithm to be evaluated in a large-scale way; while our depth estimation technique is the first calibration-free unsupervised method applied to colonoscopies. The C2D2 algorithm achieves state of the art results in the detection of deficient coverage: it is 2.4 times more accurate than human experts. View details
    Customization Scenarios for De-identification of Clinical Notes
    Danny Vainstein
    Gavin Edward Bee
    Jack Po
    Jutta Williams
    Kat Chou
    Ronit Yael Slyper
    Rony Amira
    Shlomo Hoory
    Tzvika Hartman
    BMC Medical Informatics and Decision Making (2020)
    Preview abstract Background: Automated machine-learning systems are able to de-identify electronic medical records, including free-text clinical notes. Use of such systems would greatly boost the amount of data available to researchers, yet their deployment has been limited due to uncertainty about their performance when applied to new datasets. Objective: We present practical options for clinical note de-identification, assessing performance of machine learning systems ranging from off-the-shelf to fully customized. Methods: We implement a state-of-the-art machine learning de-identification system, training and testing on pairs of datasets that match the deployment scenarios. We use clinical notes from two i2b2 competition corpora, the Physionet Gold Standard corpus, and parts of the MIMIC-III dataset. Results: Fully customized systems remove 97-99% of personally identifying information. Performance of off-the-shelf systems varies by dataset, with performance mostly above 90%. Providing a small labeled dataset or large unlabeled dataset allows for fine-tuning that improves performance over off-the-shelf systems. Conclusion: Health organizations should be aware of the levels of customization available when selecting a de-identification deployment solution, in order to choose the one that best matches their resources and target performance level. View details
    Preview abstract A streaming algorithm is said to be adversarially robust if its accuracy guarantees are maintained even when the data stream is chosen maliciously, by an adaptive adversary. We establish a connection between adversarial robustness of streaming algorithms and the notion of differential privacy. This connection allows us to design new adversarially robust streaming algorithms that outperform the current state-of-the-art constructions for many interesting regimes of parameters. View details
    Active Deep Learning to Detect Demographic Traits in Free-Form Clinical Notes
    Danny Vainstein
    Roni Rosenfeld
    Tzvika Hartman
    Journal of Biomedical Informatics (2020)
    Preview abstract The free-form portions of clinical notes are a significant source of information for research, but before they can be used, they must be de-identified to protect patients' privacy. De-identification efforts have focused on known identifier types (names, ages, dates, addresses, ID's, etc.). However, a note can contain residual "Demographic Traits" (DTs), unique enough to re-identify the patient when combined with other such facts. Here we examine whether any residual risks remain after removing these identifiers. After manually annotating over 140,000 words worth of medical notes, we found no remaining directly identifying information, and a low prevalence of demographic traits, such as marital status or housing type. We developed an annotation guide to the discovered Demographic Traits (DTs) and used it to label MIMIC-III and i2b2-2006 clinical notes as test sets. We then designed a "bootstrapped" active learning iterative process for identifying DTs: we tentatively labeled as positive all sentences in the DT-rich note sections, used these to train a binary classifier, manually corrected acute errors, and retrained the classifier. This train-and-correct process may be iterated. Our active learning process significantly improved the classifier's accuracy. Moreover, our BERT-based model outperformed non-neural models when trained on both tentatively labeled data and manually relabeled examples. To facilitate future research and benchmarking, we also produced and made publicly available our human annotated DT-tagged datasets. We conclude that directly identifying information is virtually non-existent in the multiple medical note types we investigated. Demographic traits are present in medical notes, but can be detected with high accuracy using a cost-effective human-in-the-loop active learning process, and redacted if desired. View details
    A Deep Learning Architecture for Conservative Dynamical Systems: Application to Rainfall-Runoff Modeling
    Frederik Kratzert
    Daniel Klotz
    Pieter-Jan Hoedt
    Günter Klambauer
    Sepp Hochreiter
    Hoshin Gupta
    Sella Nevo
    NeurIPS AI for Earth Sciences workshop (2020)
    Preview abstract The most accurate and generalizable rainfall-runoff models produced by the hydrological sciences community to-date are based on deep learning, and in particular, on Long Short Term Memory networks (LSTMs). Although LSTMs have an explicit state space and gates that mimic input-state-output relationships, these models are not based on physical principles. We propose a deep learning architecture that is based on the LSTM and obeys conservation principles. The model is benchmarked on the mass-conservation problem of simulating streamflow View details
    Preview abstract Floods are among the most common and deadly natural disasters in the world, and flood warning systems have been shown to be effective in reducing harm. Yet the majority of the world's vulnerable population does not have access to reliable and actionable warning systems, due to core challenges in scalability, computational costs, and data availability. In this paper we present two components of flood forecasting systems which were developed over the past year, providing access to these critical systems to 75 million people who didn't have this access before. View details
    Spectral Algorithm for Shared Low-rank Matrix Regressions
    Yotam Gigi
    Sella Nevo
    Ami Wiesel
    2020 IEEE 11th Sensor Array and Multichannel Signal Processing Workshop (SAM) (2020)
    Preview abstract We consider multiple matrix regression tasks that share common weights in order to reduce sample complexity. For this purpose, we introduce the common mechanism regression model which assumes a shared right low-rank component across all tasks, but allows an individual per-task left low-rank component. We provide a closed form spectral algorithm for recovering the common component and derive a bound on its error as a function of the number of related tasks and the number of samples available for each of them. Both the algorithm and its analysis are natural extensions of known results in the context of phase retrieval and low rank reconstruction. We demonstrate the efficacy of our approach for the challenging task of remote river discharge estimation across multiple river sites, where data for each task is naturally scarce. In this scenario sharing a low-rank component between the tasks translates to a shared spectral reflection of the water, which is a true underlying physical model. We also show the benefit of the approach in the setting of image classification where the common component can be interpreted as the shared convolution filters. View details
    Preview abstract Automatic speech recognition (ASR) systems have dramatically improved over the last few years. ASR systems are most often trained from ‘typical’ speech, which means that underrepresented groups don’t experience the same level of improvement. In this paper, we present and evaluate finetuning techniques to improve ASR for users with non standard speech. We focus on two types of non standard speech: speech from people with amyotrophic lateral sclerosis (ALS) and accented speech. We train personalized models that achieve 62% and 35% relative WER improvement on these two groups, bringing the absolute WER for ALS speakers, on a test set of message bank phrases, to 10% for mild dysarthria and 20% for more serious dysarthria. We show that 76% of the improvement comes from only 5 min of training data. Finetuning a particular subset of layers (with many fewer parameters) often gives better results than finetuning the entire model. This is the first step towards building state of the art ASR models for dysarthric speech Index Terms: speech recognition, personalization, accessibility View details
    Preview abstract Named Entity Recognition (NER) has been mostly studied in the context of written text. Specifically, NER is an important step in de-identification (de-ID) of medical records, many of which are recorded conversations between a patient and a doctor. In such recordings, audio spans with personal information should be redacted, similar to the redaction of sensitive character spans in de-ID for written text. The application of NER in the context of audio de-identification has yet to be fully investigated. To this end, we define the task of audio de-ID, in which audio spans with entity mentions should be detected. We then present our pipeline for this task, which involves Automatic Speech Recognition (ASR), NER on the transcript text, and text-to-audio alignment. Finally, we introduce a novel metric for audio de-ID and a new evaluation benchmark consisting of a large labeled segment of the Switchboard and Fisher audio datasets and detail our pipeline's results on it. View details
    Preview abstract Optimization of a machine learning model is typically carried out by performing stochastic gradient updates on epochs that consist of randomly ordered training examples. This practice means that eachfraction of an epoch comprises an independent random sample of the training data that may not preserve informative structure present in the full data. We hypothesize that the training can be more effective, allowing each epoch to provide some of the benefits of multiple ones, with more principled, ``self-similar'' arrangements. Our case study is matrix factorization, commonly used to learn metric embeddings of entities such as videos or words from example associations. We construct arrangements that preserve the weighted Jaccard similarities of rows and columns and experimentally observe that our arrangements yield training acceleration of 3\%-30\% on synthetic and recommendation datasets. Principled arrangements of training examples emerge as a novel and potentially powerful performance knob for SGD that merits further exploration. View details
    ML for Flood Forecasting at Scale
    Sella Nevo
    Ami Wiesel
    Guy Shalev
    Mor Schlesinger
    Oleg Zlydenko
    Ran El-Yaniv
    Yotam Gigi
    Zach Moshe
    Proceedings of the NIPS AI for Social Good Workshop (2018)
    Preview abstract Effective riverine flood forecasting at scale is hindered by a multitude of factors, most notably the need to rely on human calibration in current methodology, the limited amount of data for a specific location, and the computational difficulty of building continent/global level models that are sufficiently accurate. Machine learning (ML) is primed to be useful in this scenario: learned models often surpass human experts in complex high-dimensional scenarios, and the framework of transfer or multitask learning is an appealing solution for leveraging local signals to achieve improved global performance. We propose to build on these strengths and develop ML systems for timely and accurate riverine flood prediction. View details
    Towards Global Remote Discharge Estimation: Using the Few to Estimate The Many
    Yotam Gigi
    Guy Shalev
    Sella Nevo
    Zach Moshe
    Ami Wiesel
    Proceedings of the NeurIPS AI for Social Good Workshop (2018)
    Preview abstract Learning hydrologic models for accurate riverine flood prediction at scale is a challenge of great importance. One of the key difficulties is the need to rely on in-situ river discharge measurements, that can be quite scarce and unreliable, particularly in regions where floods cause the most damage every year. Accordingly, in this work we tackle the problem of river discharge estimation at different river locations. A core characteristic of the data at hand (e.g. satellite measurements) is that we have few measurements for many locations, all sharing the same physics that underlie the water discharge. We capture this phenomenon in a simple but powerful common mechanism regression (CMR) model that has a local component as well as a shared one that captures the global discharge mechanism. The resulting learning objective is non-convex, but we show that we can find its global optimum by leveraging the power of joining local measurements across sites. In particular, using a spectral initialization with provable near-optimal accuracy, we can find the optimum using standard descent methods. We demonstrate the efficacy of our approach for the problem of discharge estimation using simulations. View details
    Preview abstract A long-standing goal of human-computer interaction has been to enable people to have a natural conversation with computers, as they would with each other. In recent years, we have witnessed a revolution in the ability of computers to understand and to generate natural speech, especially with the application of deep neural networks (e.g., Google voice search, WaveNet). Still, even with today’s state of the art systems, it is often frustrating having to talk to stilted computerized voices that don't understand natural language. In particular, automated phone systems are still struggling to recognize simple words and commands. They don’t engage in a conversation flow and force the caller to adjust to the system instead of the system adjusting to the caller. Today we announce Google Duplex, a new technology for conducting natural conversations to carry out “real world” tasks over the phone. The technology is directed towards completing specific tasks, such as scheduling certain types of appointments. For such tasks, the system makes the conversational experience as natural as possible, allowing people to speak normally, like they would to another person, without having to adapt to a machine. View details
    Preview abstract Bluetooth Smart (also known as Bluetooth Low Energy) beacons broadcast their presence in order to enable proximity-based applications by observer devices. This results in a privacy and security exposure: broadcast devices are typically susceptible to tracking and spoofing based on the IDs used by the beacons. We introduce a scheme consisting of cloud-based Ephemeral Identifiers (EID) which allows only authorized parties to properly identify the beacons broadcast; it mitigates the basic tracking and security threats while keeping high utility. We outline a formal model of privacy which is obtained with our scheme, present its implementation, and discuss possible extensions. The proposal outlined here is the basis for Google’s Eddystone standard, supported by some thirty industry partners. We supply an open source implementation for all the components of the system. View details
    Nowcasting with Google Trends
    String Processing and Information Retrieval, Springer (2013), pp. 4
    Preview abstract Since launching Google Trends we have seen extensive interest in what can be learned from search trends. A plethora of studies have shown how to use search trends data for effective nowcasting in diverse areas such as health, finance, economics, politics and more. We give an overview of Google Trends and Nowcasting, highlighting some exciting Big Data challenges, including large scale engineering, effective data analysis, and domain specific considerations. View details
    Contextual OTP: Mitigating Emerging Man-in-the-Middle Attacks with Wireless Hardware Tokens
    Assaf Ben-David
    Omer Berkman
    Sarvar Patel
    Cem Paya
    Applied Cryptography and Network Security - 10th International Conference, ACNS 2012, Springer, pp. 30-47
    Preview
    Norovirus Disease Surveillance Using Google Internet Query Share Data
    Rishi Desai
    Aron J. Hall
    Benjamin A. Lopman
    Yair Shimshoni
    Marcus Rennick
    Niv Efron
    Manish M. Patel
    Umesh D. Parashar
    Clinical Infectious Diseases (2012)
    Preview abstract Google Internet query share (IQS) data for gastroenteritis-related search terms correlated strongly with contemporaneous national (R2 = 0.70) and regional (R2 = 0.74) norovirus surveillance data in the United States. IQS data may facilitate rapid identification of norovirus season onset, elevated peak activity, and potential emergence of novel strains. View details
    On Big Data Algorithmics (invited talk)
    Algorithms – ESA, Springer (2012), pp. 1
    Preview abstract The extensive use of Big Data has now become common in plethora of technologies and industries. From massive data bases to business intelligence and datamining applications; from search engines to recommendation systems; advancing the state of the art of voice recognition, translation and more. The design, analysis and engineering of Big Data algorithms has multiple flavors, including massive parallelism, streaming algorithms, sketches and synopses, cloud technologies, and more. We will discuss some of these aspects, and reflect on their evolution and on the interplay between the theory and practice of Big Data algorithmics. View details
    Preview abstract This paper presents the "Mind the Gap" initiative that aims to encourage female high school pupils to study computer science (CS) in high school. This is achieved by increasing their awareness to what CS is, and exposing them to the essence of a hi-tech environment and to same gender role models. The initiative was undertaken by female software engineers at Google's Israel R&D Center in collaboration with the Israeli National Center for Computer Science Teachers. We describe the initiative and its impact on the female pupils' interest in CS. One of our conclusions is that even a short visit to a hi-tech company, in this case – Google, has the potential to change pupils' perception of what CS is and to increase their interest in CS and their desire to study it. Our initiative can be easily adapted by other companies and can be scaled to impact a rather large population. View details
    Suggesting (More) Friends Using the Implicit Social Graph
    Maayan Roth
    Tzvika Barenholz
    Assaf Ben-David
    Guy Flysher
    Ilan Horn
    Ari Leichtberg
    Ron Merom
    International Conference on Machine Learning (ICML) (2011)
    Preview abstract Although users of online communication tools rarely categorize their contacts into groups such as "family", "co-workers", or "jogging buddies", they nonetheless implicitly cluster contacts, by virtue of their interactions with them, forming implicit groups. In this paper, we describe the implicit social graph which is formed by users' interactions with contacts and groups of contacts, and which is distinct from explicit social graphs in which users explicitly add other individuals as their "friends". We introduce an interaction-based metric for estimating a user's affinity to his contacts and groups. We then describe a novel friend suggestion algorithm that uses a user's implicit social graph to generate a friend group, given a small seed set of contacts which the user has already labeled as friends. We show experimental results that demonstrate the importance of both implicit group relationships and interaction-based affinity ranking in suggesting friends. Finally, we discuss two applications of the Friend Suggest algorithm that have been released as Gmail features. View details
    Suggesting Friends Using the Implicit Social Graph
    Maayan Roth
    Assaf Ben-David
    Guy Flysher
    Ilan Horn
    Ari Leichtberg
    Ron Merom
    Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2010)
    Preview abstract Although users of online communication tools rarely categorize their contacts into groups such as "family", "co-workers", or "jogging buddies", they nonetheless implicitly cluster contacts, by virtue of their interactions with them, forming implicit groups. In this paper, we describe the implicit social graph which is formed by users' interactions with contacts and groups of contacts, and which is distinct from explicit social graphs in which users explicitly add other individuals as their "friends". We introduce an interaction-based metric for estimating a user's affinity to his contacts and groups. We then describe a novel friend suggestion algorithm that uses a user's implicit social graph to generate a friend group, given a small seed set of contacts which the user has already labeled as friends. We show experimental results that demonstrate the importance of both implicit group relationships and interaction-based affinity ranking in suggesting friends. Finally, we discuss two applications of the Friend Suggest algorithm that have been released as Gmail Labs features. View details
    Google’s Auction for Radio and TV Ads
    Noam Nisan
    Jason Bayer
    Deepak Chandra
    Tal Franji
    Robert Gardner
    Neil Rhodes
    Misha Seltzer
    Hal Varian
    Dan Zigmond
    Google, Inc. (2009)
    Preview abstract This document describes the auction system used by Google for allocation and pricing of TV ads and Radio ads. It is based on a simultaneous ascending auction, and has been in use since September 2008. View details
    Efficient Pebbling for List Traversal Synopses with Application to Program Rollback
    Ely Porat
    Theoretical Computer Science, vol. 379, issue 3 (2007), pp. 418-436
    Preview
    The Design and Architecture of the τ-Synopses System
    Leon Portman
    Natasha Drukh
    EDBT (2006), pp. 1088-1091
    Synopses reconciliation via calibration in the τ-Synopses system
    Yariv Matia
    Leon Portman
    EDBT (2006), pp. 1139-1142
    Trends in high performance analytics
    SIGMOD Conference (2006), pp. 720
    Data Streams and Data Synopses for Massive Data Sets (invited talk)
    PKDD, Springer (2005), pp. 8-9
    Delayed dictionary compression for packet networks
    Raanan Refua
    Infocom, IEEE (2005)
    Optimal Workload-Based Weighted Wavelet Synopses
    Daniel Urieli
    ICDT (2005), pp. 368-382
    τ-Synopses: A System for Run-Time Management of Remote Synopses
    Leon Portman
    ICDE (2004), pp. 864-865
    Adaptive Probing and Communication in Sensor Networks
    Iftach Ragoler
    Nimrod Aviram
    ADHOC-NOW (2004), pp. 280-293
    Fractional XSketch Synopses for XML Databases
    Natasha Drukh
    Neoklis Polyzotis
    Minos N. Garofalakis
    XSym (2004), pp. 189-203
    τ-Synopses: A System for Run-Time Management of Remote Synopses
    Leon Portman
    EDBT (2004), pp. 865-867
    Dynamic Generation of Discrete Random Variates
    Jeffrey Scott Vitter
    Wen-Chun Ni
    Theory Comput. Syst., vol. 36 (2003), pp. 329-358
    Efficient Pebbling for List Traversal Synopses
    Ely Porat
    ICALP (2003), pp. 918-928
    Spectral Bloom Filters
    Saar Cohen
    SIGMOD Conference (2003), pp. 241-252
    Online Subpath Profiling
    David Oren
    Shmuel Sagiv
    CC (2002), pp. 78-94
    Tracking Join and Self-Join Sizes in Limited Storage
    Noga Alon
    Phillip B. Gibbons
    Mario Szegedy
    J. Comput. Syst. Sci., vol. 64 (2002), pp. 719-747
    Fast incremental maintenance of approximate histograms
    Phillip B. Gibbons
    Viswanath Poosala
    ACM Trans. Database Syst., vol. 27 (2002), pp. 261-298
    Placing search in context: the concept revisited
    Lev Finkelstein
    Evgeniy Gabrilovich
    Zach Solan
    Gadi Wolfman
    Eytan Ruppin
    ACM Trans. Inf. Syst., vol. 20 (2002), pp. 116-131
    The Effect of Flexible Parsing for Dynamic Dictionary-Based Data Compression
    Nasir Rajpoot
    S
    ACM Journal of Experimental Algorithms, vol. 6 (2001), pp. 10
    Placing search in context: the concept revisited
    Lev Finkelstein
    Evgeniy Gabrilovich
    Zach Solan
    Gadi Wolfman
    Eytan Ruppin
    WWW (2001), pp. 406-414
    Efficient bundle sorting
    Eran Segal
    Jeffrey Scott Vitter
    SODA (2000), pp. 839-848
    Guest Editors' Foreword
    Thomas H. Cormen
    Frank K. H. A. Dehne
    Pierre Fraigniaud
    Theory Comput. Syst., vol. 33 (2000), pp. 335-335
    Context-based Space Filling Curves
    Revital Dafner
    Daniel Cohen-Or
    Comput. Graph. Forum, vol. 19 (2000)
    Dynamic Maintenance of Wavelet-Based Histograms
    Jeffrey Scott Vitter
    Min Wang
    VLDB (2000), pp. 101-110
    On the temporal HZY compression scheme
    Z. Cohen
    S. Muthukrishnan
    S. Cenk Sahinalp
    Jacob Ziv
    SODA (2000), pp. 185-186
    On secure and pseudonymous client-relationships with multiple servers
    Eran Gabber
    Phillip B. Gibbons
    David M. Kristol
    Alain J. Mayer
    ACM Trans. Inf. Syst. Secur., vol. 2 (1999), pp. 390-415
    Synopsis Data Structures for Massive Data Sets
    Phillip B. Gibbons
    SODA (1999), pp. 909-910
    The Space Complexity of Approximating the Frequency Moments
    Noga Alon
    Mario Szegedy
    J. Comput. Syst. Sci., vol. 58 (1999), pp. 137-147
    Can a Shared-Memory Model Serve as a Bridging Model for Parallel Computation?
    Phillip B. Gibbons
    Vijaya Ramachandran
    Theory Comput. Syst., vol. 32 (1999), pp. 327-359
    On the Optimality of Parsing in Dynamic Dictionary Based Data Compression
    S
    SODA (1999), pp. 943-944
    The Effect of Flexible Parsing for Dynamic Dictionary Based Data Compression
    Nasir Rajpoot
    S
    Data Compression Conference (1999), pp. 238-246
    Consistent, Yet Anonymous, Web Access with LPWA
    Eran Gabber
    Phillip B. Gibbons
    David M. Kristol
    Alain J. Mayer
    Commun. ACM, vol. 42 (1999), pp. 42-47
    Modeling and Optimizing I/O Throughput of Multiple Disks on a Bus
    Rakesh D. Barve
    Elizabeth A. M. Shriver
    Phillip B. Gibbons
    Bruce Hillyer
    Jeffrey Scott Vitter
    SIGMETRICS (1999), pp. 83-92
    Modeling Parallel Bandwidth: Local versus Global Restrictions
    Micah Adler
    Phillip B. Gibbons
    Vijaya Ramachandran
    Algorithmica, vol. 24 (1999), pp. 381-404
    Provably Efficient Scheduling for Languages with Fine-Grained Parallelism
    Guy E. Blelloch
    Phillip B. Gibbons
    J. ACM, vol. 46 (1999), pp. 281-321
    Tracking Join and Self-Join Sizes in Limited Storage
    Noga Alon
    Phillip B. Gibbons
    Mario Szegedy
    PODS (1999), pp. 10-20
    An Optical Simulation of Shared Memory
    Leslie Ann Goldberg
    Satish Rao
    SIAM J. Comput., vol. 28 (1999), pp. 1829-1847
    Round-Like Behavior in Multiple Disks on a Bus
    Rakesh D. Barve
    Phillip B. Gibbons
    Bruce Hillyer
    Elizabeth A. M. Shriver
    Jeffrey Scott Vitter
    IOPADS (1999), pp. 1-9
    Wavelet-Based Histograms for Selectivity Estimation
    Jeffrey Scott Vitter
    Min Wang
    SIGMOD Conference (1998), pp. 448-459
    The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms
    Phillip B. Gibbons
    Vijaya Ramachandran
    SIAM J. Comput., vol. 28 (1998), pp. 733-769
    Curbing Junk E-Mail via Secure Classification
    Eran Gabber
    Markus Jakobsson
    Alain J. Mayer
    Financial Cryptography (1998), pp. 198-213
    Augmenting Suffix Trees, with Applications
    S. Muthukrishnan
    S. Cenk Sahinalp
    Jacob Ziv
    ESA (1998), pp. 67-78
    The Queue-Read Queue-Write Asynchronous PRAM Model
    Phillip B. Gibbons
    Vijaya Ramachandran
    Theor. Comput. Sci., vol. 196 (1998), pp. 3-29
    Implementation and Experimental Evaluation of Flexible Parsing for Dynamic Dictionary Based Data Compression
    Nasir Rajpoot
    S
    Algorithm Engineering (1998), pp. 49-61
    Modeling and Optimizing I/O Throughput of Multiple Disks on a Bus (Summary)
    Rakesh D. Barve
    Elizabeth A. M. Shriver
    Phillip B. Gibbons
    Bruce Hillyer
    Jeffrey Scott Vitter
    SIGMETRICS (1998), pp. 264-265
    Simple Fast Parallel Hashing by Oblivious Execution
    Joseph Gil
    SIAM J. Comput., vol. 27 (1998), pp. 1348-1375
    New Sampling-Based Summary Statistics for Improving Approximate Query Answers
    Phillip B. Gibbons
    SIGMOD Conference (1998), pp. 331-342
    Can Shared-Memory Model Serve as a Bridging Model for Parallel Computation?
    Phillip B. Gibbons
    Vijaya Ramachandran
    SPAA (1997), pp. 72-83
    Space-Efficient Scheduling of Parallelism with Synchronization Variables
    Guy E. Blelloch
    Phillip B. Gibbons
    Girija J. Narlikar
    SPAA (1997), pp. 12-23
    Fast Incremental Maintenance of Approximate Histograms
    Phillip B. Gibbons
    Viswanath Poosala
    VLDB (1997), pp. 466-475
    How to Make Personalized Web Browising Simple, Secure, and Anonymous
    Eran Gabber
    Phillip B. Gibbons
    Alain J. Mayer
    Financial Cryptography (1997), pp. 17-32
    Modeling Parallel Bandwidth: Local vs. Global Restrictions
    Micah Adler
    Phillip B. Gibbons
    Vijaya Ramachandran
    SPAA (1997), pp. 94-105
    Lightweight Security Primitives for E-Commerce
    Alain J. Mayer
    Abraham Silberschatz
    USENIX Symposium on Internet Technologies and Systems (1997)
    Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors
    Guy E. Blelloch
    Phillip B. Gibbons
    Marco Zagha
    IEEE Trans. Parallel Distrib. Syst., vol. 8 (1997), pp. 943-958
    Efficient Low-Contention Parallel Algorithms
    Phillip B. Gibbons
    Vijaya Ramachandran
    J. Comput. Syst. Sci., vol. 53 (1996), pp. 417-442
    The Queue-Read Queue-Write Asynchronous PRAM Model
    Phillip B. Gibbons
    Vijaya Ramachandran
    Euro-Par, Vol. II (1996), pp. 279-292
    The Space Complexity of Approximating the Frequency Moments
    Noga Alon
    Mario Szegedy
    STOC (1996), pp. 20-29
    An Effective Load Balancing Policy for Geometric-Decaying Algorithms
    Joseph Gil
    J. Parallel Distrib. Comput., vol. 36 (1996), pp. 185-188
    Modeling Skewed Distribution Using Multifractals and the `80-20' Law
    Christos Faloutsos
    Abraham Silberschatz
    VLDB (1996), pp. 307-317
    Shuffling Biological Sequences
    Denise B. Kandel
    Ron Unger
    Peter Winkler
    Discrete Applied Mathematics, vol. 71 (1996), pp. 171-185
    Bifocal Sampling for Skew-Resistant Join Size Estimation
    Sumit Ganguly
    Phillip B. Gibbons
    Abraham Silberschatz
    SIGMOD Conference (1996), pp. 271-281
    Frequency-Spatial Transformation: A Proposal for Parsimonious Intra-Cortical Communication
    Regev Levi
    Eytan Ruppin
    James A. Reggia
    Int. J. Neural Syst., vol. 7 (1996), pp. 591-598
    Asynchrony versus Bulk-Synchrony in QRQW PRAM model (Abstract)
    Phillip B. Gibbons
    Vijaya Ramachandran
    PODC (1996), pp. 176
    Provably Efficient Scheduling for Languages with Fine-Grained Parallelism
    Guy E. Blelloch
    Phillip B. Gibbons
    SPAA (1995), pp. 1-12
    A Simple Randomized Sieve Algorithm for the Closest-Pair Problem
    Samir Khuller
    Inf. Comput., vol. 118 (1995), pp. 34-37
    Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors
    Guy E. Blelloch
    Phillip B. Gibbons
    Marco Zagha
    SPAA (1995), pp. 84-94
    Approximate Data Structures with Applications
    Jeffrey Scott Vitter
    Neal E. Young
    SODA (1994), pp. 187-194
    Designing Algorithms by Expectations
    Joseph Gil
    Inf. Process. Lett., vol. 51 (1994), pp. 31-34
    Optimal Parallel Approximation for Prefix Sums and Integer Sorting
    Michael T. Goodrich
    Uzi Vishkin
    SODA (1994), pp. 241-250
    Efficient Low-Contention Parallel Algorithms
    Phillip B. Gibbons
    Vijaya Ramachandran
    SPAA (1994), pp. 236-247
    Elections in Anonymous Networks
    Yehuda Afek
    Inf. Comput., vol. 113 (1994), pp. 312-330
    Simple Fast Parallel Hashing
    Joseph Gil
    ICALP (1994), pp. 239-250
    The QRQW PRAM: Accounting for Contention in Parallel Algorithms
    Phillip B. Gibbons
    Vijaya Ramachandran
    SODA (1994), pp. 638-648
    An Optical Simulation of Shared Memory
    Leslie Ann Goldberg
    Satish Rao
    SPAA (1994), pp. 257-267
    Approximate Parallel Prefix Computation and its Applications
    Michael T. Goodrich
    Uzi Vishkin
    IPPS (1993), pp. 318-325
    Dynamic Generation of Discrete Random Variates
    Jeffrey Scott Vitter
    Wen-Chun Ni
    SODA (1993), pp. 361-370
    Semi-dynamic Closest-pair Algorithms
    CCCG (1993), pp. 264-271
    Randomized Range-Maxima in Nearly-Constant Parallel Time
    Omer Berkman
    Uzi Vishkin
    Computational Complexity, vol. 2 (1992), pp. 350-373
    Polynomial Hash Functions Are Reliable
    Martin Dietzfelbinger
    Joseph Gil
    Nicholas Pippenger
    ICALP (1992), pp. 235-246
    Efficient Randomized Dictionary Matching Algorithms
    Amihood Amir
    Martin Farach
    CPM (1992), pp. 262-275
    On Parallel Hashing and Integer Sorting
    Uzi Vishkin
    J. Algorithms, vol. 12 (1991), pp. 573-606
    Converting High Probability into Nearly-Constant Time-with Applications to Parallel Hashing
    Uzi Vishkin
    STOC (1991), pp. 307-316
    Towards a Theory of Nearly Constant Time Parallel Algorithms
    Joseph Gil
    Uzi Vishkin
    FOCS (1991), pp. 698-710
    On Parallel Hashing and Integer Sorting (Extended Summary)
    Uzi Vishkin
    ICALP (1990), pp. 729-743
    Simple and Efficient Election Algorithms for Anonymous Networks
    Yehuda Afek
    WDAG (1989), pp. 183-194
    A Video Scrambling Technique Based On Space Filling Curves
    Adi Shamir
    CRYPTO (1987), pp. 398-417