Federated learning of out-of-vocabulary words

Françoise Simone Beaufays; Mingqing Chen; Rajiv Mathews; Tom Ouyang

Federated learning of out-of-vocabulary words

Françoise Simone Beaufays

Mingqing Chen

Rajiv Mathews

Tom Ouyang

(2019)

Download Google Scholar

Abstract

We demonstrate that a character-level LSTM neural network is able to learn
out-of-vocabulary (OOV) words for the purpose of expanding the vocabulary of
a virtual keyboard for smartphones.
We train such a model using a distributed, on-device learning framework called federated learning.
High-frequency words can then be sampled from the generative model
by drawing from the joint posterior directly.
We study the feasibility of the approach in three different settings:
(1) using stochastic gradient descent, on an anonymized dataset of snippets of user content;
(2) using simulated federated learning, on a publicly available non-IID per-user dataset from a popular social networking website;
(3) using federated learning, on data hosted on user mobile devices.
The model is shown to achieve good recall and precision when compared to ground-truth OOV words in settings (1) and (2).
With (3) we demonstrate the practicality of this approach by showing that we can learn meaningful OOV words without exporting sensitive user data to servers.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Federated learning of out-of-vocabulary words

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs