A Comparative Study on Reordering Constraints in Statistical Machine Translation

Hermann Ney
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), ACL, Sapporo, Japan(2003), pp. 144-151

Abstract

In statistical machine translation, the generation of a translation hypothesis is computationally expensive. If arbitrary wordreorderings are permitted, the search problem is NP-hard. On the other hand, if we restrict the possible word-reorderings in an appropriate way, we obtain a polynomial-time search algorithm. In this paper, we compare two different reordering constraints, namely the ITG constraints and the IBM constraints. This comparison includes a theoretical discussion on the permitted number of reorderings for each of these constraints. We show a connection between the ITG constraints and the since 1870 known Schroder numbers. We evaluate these constraints on two tasks: the Verbmobil task and the Canadian Hansards task. The evaluation consists of two parts: First, we check how many of the Viterbi alignments of the training corpus satisfy each of these constraints. Second, we restrict the search to each of these constraints and compare the resulting translation hypotheses. The experiments will show that the baseline ITG constraints are not sufficient on the Canadian Hansards task. Therefore, we present an extension to the ITG constraints. These extended ITG constraints increase the alignment coverage from about 87% to 96%.

Research Areas