Recent focus on assistant products has increased the need for extremely flexible speech systems that adapt well to specific users' needs. An important aspect of this is enabling users to make voice commands referencing their own personal data, such as favorite songs, application names, and contacts. Recognition accuracy for common commands such as playing music and sending text messages can be greatly improved if we know a user's preferences.
In the past, we have addressed this problem using class-based language models that allow for query-time injection of class instances. However, this approach is limited by the need to train class-based models ahead of time.
In this work, we present a significantly more flexible system for query-time injection of user context. Our system dynamically injects the classes into a non-class-based language model. We remove the need to select the classes at language model training time. Instead, our system can vary the classes on a per-client, per-use case, or even a per-request basis.
With the ability to inject new classes per-request outlined in this work, our speech system can support a diverse set of use cases by taking advantage of a wide range of contextual information specific to each use case.