A Systematic Comparison of Training Criteria for Statistical Machine Translation

Sasa Hasan
Hermann Ney
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), ACL, Prague, Czech Republic(2007), pp. 524-532

Abstract

We address the problem of training the free parameters of a statistical machine translation system. We show significant improvements over a state-of-the-art minimum error rate training baseline on a large ChineseEnglish translation task. We present novel training criteria based on maximum likelihood estimation and expected loss computation. Additionally, we compare the maximum a-posteriori decision rule and the minimum Bayes risk decision rule. We show that, not only from a theoretical point of view but also in terms of translation quality, the minimum Bayes risk decision rule is preferable.

Research Areas