Federated Learning of N-gram Language Models
Abstract
We propose algorithms to train production-quality n-gram language models using federated learning. Federated learning is a machine learning technique to train global models to be used on portable devices such as smart phones, without the users' data ever leaving their devices. This is especially relevant for applications handling privacy-sensitive data, such as virtual keyboards. While the principles of federated learning are fairly generic, its methodology assumes that the underlying models are neural networks. However, virtual keyboards are typically powered by n-gram language models, mostly for latency reasons.
We propose to train a recurrent neural network language model using the decentralized "FederatedAveraging" algorithm directly on training and to approximating this federated model server-side with an n-gram model that can be deployed to devices for fast inference.
Our technical contributions include novel ways of handling large vocabularies, algorithms to correct capitalization errors in user data, and efficient finite state transducer algorithms to convert word language models to word-piece language models and vice versa.
The n-gram language models trained with federated learning are compared to n-grams trained with traditional server-based algorithms using A/B tests on tens of millions of users of a virtual keyboard.
Results are presented for two languages, American English and Brazilian Portuguese. This work demonstrates that high-quality n-gram language models can be trained directly on client mobile devices without sensitive training data ever leaving the device.
We propose to train a recurrent neural network language model using the decentralized "FederatedAveraging" algorithm directly on training and to approximating this federated model server-side with an n-gram model that can be deployed to devices for fast inference.
Our technical contributions include novel ways of handling large vocabularies, algorithms to correct capitalization errors in user data, and efficient finite state transducer algorithms to convert word language models to word-piece language models and vice versa.
The n-gram language models trained with federated learning are compared to n-grams trained with traditional server-based algorithms using A/B tests on tens of millions of users of a virtual keyboard.
Results are presented for two languages, American English and Brazilian Portuguese. This work demonstrates that high-quality n-gram language models can be trained directly on client mobile devices without sensitive training data ever leaving the device.