Tuplemax Loss for Language Identification

Li Wan
Prashant Sridhar
Yang Yu
ICASSP 2019(2019)


In many scenarios of a language identification task, the user will specify a set of languages which he/she speaks from a large set of all languages. This setup usually happens before the real-time identification. We want to model such prior knowledge into the way we train our neural networks, by replacing the commonly used softmax loss function with a novel loss function named \emph{tuplemax loss}. For example, a language identification system launched in North America may have $95\%$ users only speaking up to two languages. Together with a sliding window LSTM inference approach, our language identification system achieves a $2.33$\% error rate, which is a relative $48.5$\% improvement over the $4.50\%$ error rate of standard softmax loss method.

Research Areas