Abstract
We introduce phylogenetic and areal language features to the domain of multilingual text-to-speech (TTS) synthesis. Intuitively, enriching the existing universal phonetic features with such cross-language shared representations should benefit the multilingual acoustic models and help to address issues like data scarcity for low-resource languages. We investigate these representations using the acoustic models based on long short-term memory (LSTM) recurrent neural networks (RNN). Subjective evaluations conducted on eight languages from diverse language families show that sometimes phylogenetic and areal representations lead to significant multilingual synthesis quality improvements.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work