Gleb Mazovetskiy
Staff Software Engineer in TTS Research.
Research Areas
Authored Publications
Sort By
Preview abstract
Japanese text-to-pronunciation modelling is a notoriously data-intensive problem. Japanese data sources are often only partially annotated, and use different annotation standards for pronunciation and word segmentation. This talk introduces a set of techniques that enable ingesting data that may be partially annotated, use arbitrary word segmentations, and use a variety of pronunciation annotation standards.
View details
Preview abstract
We describe a pre-existing rule-based homograph disambiguation system used for text-to-speech synthesis at Google, and compare it to a novel system which performs disambiguation using classifiers trained on a small amount of labeled data. An evaluation of these systems, using a new, freely available English data set, finds that hybrid systems (making use of both rules and machine learning) are significantly more accurate than either hand-written rules or machine learning alone. The evaluation also finds minimal performance degradation when the hybrid system is configured to run on limited-resource mobile devices rather than on production servers. The two best systems described here are used for homograph disambiguation on all US English text-to-speech traffic at Google.
View details