Adam Grycner

Adam Grycner

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    PaLI: A Jointly-Scaled Multilingual Language-Image Model
    Piotr Padlewski
    Daniel Salz
    Sebastian Alexander Goodman
    Basil Mustafa
    Lucas Beyer
    Alexander Kolesnikov
    Keran Rong
    Hassan Akbari
    Linting Xue
    James Bradbury
    Chao Jia
    Carlos Riquelme
    Xiaohua Zhai
    Neil Houlsby
    International Conference on Learning Representations (ICLR) (2023)
    Preview abstract Effective scaling and a flexible task interface enable large-capacity language models to excel at many tasks. PaLI (Pathways Language and Image model) extends these ideas to the joint modeling of language and vision. PaLI is a model that generates text based on visual and textual inputs. Using this API, PaLI is able to perform many vision, language, and multimodal tasks, across many languages. We train PaLI with two main principles: reuse of pretrained unimodal components, and joint scaling of modalities. Using large-capacity pretrained language models and vision models allows us to capitalize on their existing capabilities, while leveraging the substantial cost of training them. We scale PaLI models across three axes:the language component, the vision component, and the training data that fuses them. For the vision component, we train the largest and best-performing VisionTransformer (ViT) to date. For the data, we build an image-text training set over10B images and covering over 100 languages. PaLI inherits and enhances language-understanding capabilities, and achieves state-of-the-art in multiple vision and language tasks (image classification, image captioning, visual question-answering, scene-text understanding, etc.), based on a simple, modular, and reuse-friendly platform for modeling and scaling. View details