PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS

Ye Jia

Heiga Zen (Byungha Chun)

Jonathan Shen

Yu Zhang

Yonghui Wu

Interspeech(2021)

Download Google Scholar

Abstract

This paper introduces a new encoder model for neural TTS. The proposed model, called PnG BERT, is augmented from the original BERT model, but taking both phoneme and grapheme representation of a text, as well as the word-level alignment between them, as its input. It can be pre-trained on a large text corpus in a self-supervised manner then fine-tuned in a TTS task. The experimental results suggest that PnG BERT can significantly further improve the performance of a state-of-the-art neural TTS model, by producing more appropriate prosody and more accurate pronunciation. Subjective side-by-side preference evaluation showed that raters had no statistically significant preference between the synthesized speech and the ground truth recordings from professional speakers.

Research Areas

Natural Language Processing
Speech Processing

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities