Position-Invariant Truecasing with a Word-and-Character Hierarchical Recurrent Neural Network

Hao Zhang; You-Chi Cheng; Shankar Kumar; Mingqing Chen; Rajiv Mathews

Position-Invariant Truecasing with a Word-and-Character Hierarchical Recurrent Neural Network

Hao Zhang

You-Chi Cheng

Shankar Kumar

Mingqing Chen

Rajiv Mathews

EMNLP (2021)

Download Google Scholar

Abstract

Truecasing is the task of restoring the correct case (uppercase or lowercase) of noisy text generated either by an automatic system for speech recognition or machine translation or by humans. It improves the performance of downstream NLP tasks such as named entity recognition and language modeling. We propose a fast, accurate and compact two-level hierarchical word-and-character-based recurrent neural network model, the first of its kind for this problem. Using sequence distillation, we also address the problem of truecasing while ignoring token positions in the sentence, i.e. in a position-invariant manner.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Position-Invariant Truecasing with a Word-and-Character Hierarchical Recurrent Neural Network

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs