Multilingual speech recognition models are capable of recognizing speech in multiple different languages. Depending on the amount of training data, and the relatedness of languages, these models can outperform their monolingual counterparts. However, the performance of these models heavily relies on an externally provided language-id which is used to augment the input features or modulate the neural network's per-layer outputs using a language-gate. In this paper, we introduce a novel technique for inferring the language-id in a streaming fashion using the RNN-T loss that eliminates reliance on knowing the utterance's language. We conduct experiments on two sets of languages, arabic and nordic, and show the effectiveness of our approach.