Multi-Accent Speech Recognition with Hierarchical Grapheme Based Models
Abstract
We explore the viability of grapheme-based
recognition specifically how it compares to phoneme-based
equivalents. We utilize the CTC loss to train models to directly
predict graphemes, we also train models with hierarchical
CTC and show that they improve on previous CTC models.
We also explore how the grapheme and phoneme models
scale with large data sets, we consider a single acoustic training
data set where we combine various dialects of English from
US, UK, India and Australia. We show that by training a single
grapheme-based model on this multi-dialect data set we create
a accent-robust ASR system
recognition specifically how it compares to phoneme-based
equivalents. We utilize the CTC loss to train models to directly
predict graphemes, we also train models with hierarchical
CTC and show that they improve on previous CTC models.
We also explore how the grapheme and phoneme models
scale with large data sets, we consider a single acoustic training
data set where we combine various dialects of English from
US, UK, India and Australia. We show that by training a single
grapheme-based model on this multi-dialect data set we create
a accent-robust ASR system