Machine Translation Evaluation beyond the Sentence Level

Jindřich Libovický
European Association for Machine Translation, Alicante, Spain (2018), pp. 179-188
Google Scholar

Abstract

Automatic machine translation evaluation was crucial for the rapid development
of machine translation systems over the last two decades. So far, most
attention has been paid to the evaluation metrics that work with text on the sentence
level and so did the translation systems.
Across-sentence translation quality depends on discourse
phenomena that may not manifest at all when staying within sentence
boundaries (e.g. coreference, discourse connectives, verb tense sequence etc.).
To tackle this, we propose several document-level MT evaluation metrics:
generalizations of sentence-level metrics, language-(pair)-independent versions
of lexical cohesion scores and coreference and morphology preservation in the
target texts. We measure their agreement with human judgment on a newly
created dataset of pair-wise paragraph comparisons for four language pairs.