Back-off Language Model Compression

Boulos Harb; Ciprian Chelba; Jeffrey Dean; Sanjay Ghemawat

Back-off Language Model Compression

Boulos Harb

Ciprian Chelba

Jeffrey Dean

Sanjay Ghemawat

Proceedings of Interspeech 2009, International Speech Communication Association (ISCA), pp. 325-355

Google Scholar

Abstract

With the availability of large amounts of training data relevant to speech recognition scenarios,
scalability becomes a very productive way to improve language model performance. We present a
technique that represents a back-off n-gram language model using arrays of integer values and thus
renders it amenable to effective block compression. We propose a few such compression algorithms
and evaluate the resulting language model along two dimensions: memory footprint, and speed
reduction relative to the uncompressed one. We experimented with a model that uses a 32-bit word
vocabulary (at most 4B words) and log-probabilities/back-off-weights quantized to 1 byte, respectively.
The best compression algorithm achieves 2.6 bytes/n-gram at ≈18X slower than uncompressed. For
faster LM operation we found it feasible to represent the LM at ≈4.0 bytes/n-gram, and ≈3X slower
than the uncompressed LM. The memory footprint of a LM containing one billion n-grams can thus be
reduced to 3–4 Gbytes without impacting its speed too much.

See the presentation material from a talk about this paper.

Research Areas

Natural language processing

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Back-off Language Model Compression

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs