From audio to semantics: Approaches to end-to-end spoken language understanding

Parisa Haghani; Arun Narayanan; Michiel Adriaan Unico Bacchiani; Galen Chuang; Neeraj Gaur; Pedro Jose Moreno Mengibar; Delia Qu; Rohit Prabhavalkar; Austin Waters

From audio to semantics: Approaches to end-to-end spoken language understanding

Parisa Haghani

Arun Narayanan

Michiel Adriaan Unico Bacchiani

Galen Chuang

Neeraj Gaur

Pedro Jose Moreno Mengibar

Delia Qu

Rohit Prabhavalkar

Austin Waters

Spoken Language Technology Workshop (SLT), 2018 IEEE

Google Scholar

Abstract

Conventional spoken language understanding systems consist of two main components: an automatic speech recognition module that converts audio to text, and a natural language understanding module that transforms the resulting text (or top N hypotheses) into a set of intents and arguments. These modules are typically optimized independently. In this paper, we formulate audio to semantic understanding as a sequence-to-sequence problem. We propose and compare various encoder-decoder based approaches that optimizes both modules jointly, in an end-to-end manner. We evaluate these methods on a real-world task. Our results show that having an intermediate text representation while jointly optimizing the full system improves accuracy of prediction.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

From audio to semantics: Approaches to end-to-end spoken language understanding

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs