Character-Aware Models Improve Visual Text Rendering

Rosanne Liu; Dan Garrette; Chitwan Saharia; William Chan; Adam Roberts; Sharan Narang; Irina Blok; RJ Mical; Mohammad Norouzi; Noah Constant

Character-Aware Models Improve Visual Text Rendering

Rosanne Liu

Dan Garrette

Chitwan Saharia

William Chan

Adam Roberts

Sharan Narang

Irina Blok

RJ Mical

Mohammad Norouzi

Noah Constant

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (2023)

Download Google Scholar

Abstract

Current image generation models struggle to reliably produce well-formed visual text. In this paper, we investigate a key contributing factor: popular text-to-image models lack character-level input features, making it much harder to predict a word's visual makeup as a series of glyphs. To quantify this effect, we conduct a series of experiments comparing character-aware vs. character-blind text encoders. In the text-only domain, we find that character-aware models provide large gains on a novel spelling task (WikiSpell). Applying our learnings to the visual domain, we train a suite of image generation models, and show that character-aware variants outperform their character-blind counterparts across a range of novel text rendering tasks (our DrawText benchmark). Our models set a much higher state-of-the-art on visual spelling, with 30+ point accuracy gains over competitors on rare words, despite training on far fewer examples.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Character-Aware Models Improve Visual Text Rendering

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs