Original or Translated? A Causal Analysis of the Impact of Translationese on Machine Translation Performance

Jingwei Ni

Zhijing Jin

Markus Freitag

Mrinmaya Sachan

Bernhard Scholkopf

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Seattle, United States, pp. 5303-5320

Download Google Scholar

Abstract

Human-translated text displays distinct features from naturally written text in the same language. This phenomena, known as translationese, has been argued to confound the machine translation (MT) evaluation. Yet, we find that existing work on translationese neglects some important factors and the conclusions are mostly correlational but not causal. In this work, we collect CAUSALMT, a dataset where the MT training data are also labeled with the human translation directions. We inspect two critical factors, the train-test alignment (whether the human translation directions in the training and test sets are aligned), and data-model alignment (whether the model learns in the same direction as the human translation direction in the dataset). We show that these two factors have a large causal effect on the MT performance, in addition to the test-model misalignment highlighted by existing work on the impact of translationese in the test set. In light of our findings, we provide a set of suggestions for MT training and evaluation.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Original or Translated? A Causal Analysis of the Impact of Translationese on Machine Translation Performance

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Original or Translated? A Causal Analysis of the Impact of Translationese on Machine Translation Performance

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities