Tuplemax Loss for Language Identification
Abstract
In many scenarios of a language identification task, the user will specify a set of languages which he/she speaks from
a large set of all languages. This setup usually happens before the real-time identification.
We want to model such prior knowledge into the way we train our neural networks, by replacing the commonly used softmax loss function with a novel loss function named \emph{tuplemax loss}. For example, a language identification system launched in North America may have $95\%$ users only speaking up to two languages.
Together with a sliding window LSTM inference approach, our language identification system achieves a
$2.33$\% error rate, which is a relative $48.5$\% improvement over the $4.50\%$ error rate of standard softmax loss method.
a large set of all languages. This setup usually happens before the real-time identification.
We want to model such prior knowledge into the way we train our neural networks, by replacing the commonly used softmax loss function with a novel loss function named \emph{tuplemax loss}. For example, a language identification system launched in North America may have $95\%$ users only speaking up to two languages.
Together with a sliding window LSTM inference approach, our language identification system achieves a
$2.33$\% error rate, which is a relative $48.5$\% improvement over the $4.50\%$ error rate of standard softmax loss method.