Mitigating metric bias in minimum bayes risk decoding
Abstract
Minimum bayes risk decoding has been shown to improve translation quality both on automated metrics and human evaluations. In this paper we show that MBR decoding tends to show larger improvements in the utility metric and similar metrics, compared to other unrelated metrics. To mitigate this metric bias issue, we explore using MBR decoding using ensembles of multiple metrics as the utility function, as well as QE filtering followed by MBR decoding. Human evaluations show that using an ensemble of metrics improves quality over MBR or QE decoding with a single metric.