Sequence-to-sequence Neural Network model with 2D attention for learning Japanese pitch accents

Antoine Bruguier; Heiga Zen; Arkady Arkhangorodsky

Sequence-to-sequence Neural Network model with 2D attention for learning Japanese pitch accents

Antoine Bruguier

Heiga Zen

Arkady Arkhangorodsky

Interspeech, 2018 (2018)

Google Scholar

Abstract

Many Japanese text-to-speech (TTS) systems use word-level pitch accents as one of their prosodic features. Combination of a pronunciation dictionary including lexical pitch accents and a statistical model representing the word accent sandhi is often used to predict pitch accents from a text. However, using human transcribers to build the dictionary and training data for the model is tedious and expensive. This paper proposes a neural pitch accent recognition model. This model combines the information from audio, and its transcription (word sequence in hiragana characters) via two-dimensional attention and outputs word-level pitch accents. Experimental results show a reduction in the word pitch accent prediction error rate over that with text only. It lowers the load of human annotators when building a pronunciation dictionary. As the approach is general, it can be used to do pronunciation learning in other languages as well.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Sequence-to-sequence Neural Network model with 2D attention for learning Japanese pitch accents

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs