George

I'm a Research Scientist in Computer Vision and Machine Learning. After completing my Ph.D., I was a Postdoc at the MIT developing CV/ML solutions for affordable and accessible healthcare. Then, I joined Amazon Research where I developed CV systems for Amazon Go and Amazon One. Then I worked at Google Health and Verily on ML/CV healthcare projects. In my current role I am working on satellite and aerial imagery, developing vision-language foundational models, in Google Research.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Enhancing Remote Sensing Representations through Mixed-Modality Masked Autoencoding
    Ori Linial
    Yochai Blau
    Nadav Sherman
    Yotam Gigi
    Wojciech Sirko
    Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops (2025), pp. 507-516
    Preview abstract This paper presents an innovative approach to pre-training models for remote sensing by integrating optical and radar data from Sentinel-2 and Sentinel-1 satellites. Using a novel variation on the masked autoencoder (MAE) framework, our model incorporates a dual-task setup: reconstructing masked Sentinel-2 images and predicting corresponding Sentinel-1 images. This multi-task design enables the encoder to capture both spectral and structural features across diverse environmental conditions. Additionally, we introduce a "mixing" strategy in the pretraining phase, combining patches from both image sources, which mitigates spatial misalignment errors and enhances model robustness. Evaluation on segmentation and classification tasks, including Sen1Floods11 and BigEarthNet, demonstrates significant improvements in adaptability and generalizability across varied downstream remote sensing applications. Our findings highlight the advantages of leveraging complementary modalities for more resilient and versatile land cover analysis. View details
    A Recipe for Improving Remote Sensing Zero Shot Generalization
    Aviad Barzilai
    Yotam Gigi
    Vered Silverman
    Yehonathan Refael
    Bolous Jaber
    Amr Helmy
    3rd ML4RS Workshop at ICLR 2025
    Preview abstract Foundation models have had a significant impact across various AI applications, enabling applications for use cases that were previously impossible. Visual language models (VLMs), in particular, have outperformed other techniques in many tasks. In remote sensing (RS), foundation models have shown improvements across various applications. However, unlike other fields, the use of VLMs with large-scale remote sensing image-text datasets remains limited. In this work, we first introduce two novel image-caption datasets for training of remote sensing foundation models. The first dataset pairs aerial and satellite imagery, aligned with Google-Maps data, with high-quality captions generated using Gemini. The second utilizes public web images and their corresponding alt-text, filtered for only remote sensing domain, resulting in a highly diverse dataset. We show that using these datasets to pre-train the Mammut [], a VLM architecture, results in state-of-the-art generalization performance in a zero-shot classification and cross-modal retrieval on well-known public benchmarks. Secondly, we leverage this newly pre-trained VLM to generate inference attention maps for a novel class query (i.e., a class unseen during training). We subsequently propose an iterative self-supervised fine-tuning approach where samples aligned with these attention maps are iteratively pseudo-labeled and utilized for model training. View details
    Pixel-accurate Segmentation of Surgical Tools based on Bounding Box Annotations
    Tomer Golany
    Daniel Freedman
    Ehud Rivlin
    Amit Aides
    2022 26th International Conference on Pattern Recognition (ICPR), pp. 5096-5103
    Preview abstract Detection and segmentation of surgical instruments is an important problem for the laparoscopic surgery. Accurate pixel-wise instrument segmentation used as an intermediate task for the development of computer-assisted surgery systems, such as pose estimation, surgical phase estimation, enhanced image fusion, video retrieval and others. In this paper we describe our deep learning-based approach for instrument segmentation, which addresses the binary segmentation problem, where every pixel in an image is labeled as an instrument or background. Our approach relies on weak annotations provided as bounding boxes of the instruments, which is much faster and cheaper to obtain than a dense pixel-level annotation. To improve the accuracy even further we propose a novel approach to generate synthetic training images. Our approach achieves state-of-the-art results, outperforming previously proposed methods for automatic instrument segmentation, based on weak annotations only. View details