Google Research

Federated Learning Of Out-Of-Vocabulary Words

Neurips 2019 federated learning workshop (2019)

Abstract

We demonstrate that a character-level recurrent neural network is able to learn out-of-vocabulary (OOV) words under federated learning (FL) settings, for the purpose of expanding the vocabulary of a virtual keyboard for smartphones without exporting sensitive text to servers. High-frequency words can be sampled from the trained generative model by drawing from the joint posterior directly. We study the feasibility of the approach in two settings: (1) simulated FL on a publicly available non-IID per-user dataset from a popular social networking website, (2) FL on data hosted on user mobile devices. The model achieves good recall and precision compared to ground-truth OOV words in setting (1). With (2) we demonstrate the practicality by showing meaningful OOV words and good character-level prediction accuracy and cross entropy loss.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work