Contextual Language Model Adaptation Using Dynamic Classes
Abstract
Recent focus on assistant products has increased the need for extremely
flexible speech systems that adapt
well to specific users' needs. An important aspect of this is enabling users to
make voice commands referencing their own personal data, such as favorite songs,
application names, and contacts. Recognition accuracy for common commands such
as playing music and sending text messages can be greatly improved if we know a
user's preferences.
In the past, we have addressed this problem using class-based language models
that allow for query-time injection of class instances. However, this approach
is limited by the need to train class-based models ahead of time.
In this work, we present a significantly more flexible system for query-time
injection of user context. Our system dynamically injects the classes
into a non-class-based language model. We remove the need to select the classes
at language model training time. Instead, our system can vary the classes on a
per-client, per-use case, or even a per-request basis.
With the ability to inject new classes per-request outlined in this work, our
speech system can support a diverse set of use cases by
taking advantage of a wide range of contextual information specific to each
use case.
flexible speech systems that adapt
well to specific users' needs. An important aspect of this is enabling users to
make voice commands referencing their own personal data, such as favorite songs,
application names, and contacts. Recognition accuracy for common commands such
as playing music and sending text messages can be greatly improved if we know a
user's preferences.
In the past, we have addressed this problem using class-based language models
that allow for query-time injection of class instances. However, this approach
is limited by the need to train class-based models ahead of time.
In this work, we present a significantly more flexible system for query-time
injection of user context. Our system dynamically injects the classes
into a non-class-based language model. We remove the need to select the classes
at language model training time. Instead, our system can vary the classes on a
per-client, per-use case, or even a per-request basis.
With the ability to inject new classes per-request outlined in this work, our
speech system can support a diverse set of use cases by
taking advantage of a wide range of contextual information specific to each
use case.