Cross-lingual projection for class-based language models
Abstract
This paper presents a cross-lingual projection
technique for training class-based
language models. We borrow from previous
success in projecting POS tags and
NER mentions to that of a trained classbased
language model. We use a CRF
to train a model to predict when a sequence
of words is a member of a given
class and use this to label our language
model training data. We show that we can
successfully project the contextual cues
for these classes across pairs of languages
and retain a high quality class model in
languages with no supervised class data.
We present empirical results that show the
quality of the projected models as well
as their effect on the down-stream speech
recognition objective. We are able to
achieve over half the reduction of WER
when using the projected class models as
compared to models trained on human annotations.
technique for training class-based
language models. We borrow from previous
success in projecting POS tags and
NER mentions to that of a trained classbased
language model. We use a CRF
to train a model to predict when a sequence
of words is a member of a given
class and use this to label our language
model training data. We show that we can
successfully project the contextual cues
for these classes across pairs of languages
and retain a high quality class model in
languages with no supervised class data.
We present empirical results that show the
quality of the projected models as well
as their effect on the down-stream speech
recognition objective. We are able to
achieve over half the reduction of WER
when using the projected class models as
compared to models trained on human annotations.