Modelling Intonation in Spectrograms for Neural Vocoder based Text-to-Speech

Vincent Wan

Jonathan Shen

Hanna Silen

Rob Clark

Speech Prosody 2020

Download Google Scholar

Abstract

Intonation is characterized by rises and falls in pitch and energy. In previous work, we explicitly modelled these prosodic features using Clockwork Hierarchical Variational Autoencoders (CHiVE) to show we can generate multiple intonation contours for any text. However, recent advances in text-to-speech synthesis produce spectrograms which are inverted by neural vocoders to produce waveforms. Spectrograms encode intonation in a complex way; there is no simple, explicit representation analogous to pitch (fundamental frequency) and energy. In this paper, we extend CHiVE to model intonation within a spectrogram. Compared to the original model, the spectrogram extension gives better mean opinion scores in subjective listening tests. We show that the intonation in the generated spectrograms match the intonations represented by the generated pitch curves.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Modelling Intonation in Spectrograms for Neural Vocoder based Text-to-Speech

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Modelling Intonation in Spectrograms for Neural Vocoder based Text-to-Speech

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities