Language-agnostic Multilingual Modelling
Abstract
Automated speech recognition (ASR) coverage of the world's languages continues to expand. Yet, as data-demanding neural network models continue to revolutionize the field, it poses a challenge for data-scarce languages. Multilingual models allow for the joint training of data-scarce and data-rich languages enabling data and parameter sharing. One of the main goals of multilingual ASR is to build a single model for all languages while reaping the benefits of sharing on data-scarce languages without impacting performance on the data-rich languages. However, most state-of-the-art multilingual models require the encoding of language information and therefore are not as flexible or scalable when expanding to newer languages. Language independent multilingual models help to address this, as well as, are more suited to multicultural societies such as in India, where languages overlap and are frequently used together by native speakers. In this paper, we propose a new approach to building a language-agnostic multilingual ASR system using transliteration. This training strategy maps all languages to one writing system through a many-to-one transliteration transducer that maps similar sounding acoustics to one target sequences such as, graphemes, phonemes or wordpieces resulting in improved data sharing and reduced phonetic confusions. We propose a training strategy that maps all languages to one writing system through a many-to-one transliteration transducer. We show with four Indic languages, namely, Hindi, Bengali, Tamil and Kannada, that the resulting multilingual model achieves a performance comparable to a language-dependent multilingual model, with an improvement of up to 15\% relative on the data-scarce language.