SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation

Eneko Agirre
Carmen Banea
Mona Diab
Aitor Gonzalez-Agirre
Rada Mihalcea
German Rigau
Janyce Wiebe
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), Association for Computational Linguistics, San Diego, California, pp. 497-511

Abstract

Semantic Textual Similarity (STS) is the degree of semantic equivalence between two snippets of text. Similarity is expressed on an ordinal scale that spans from semantic equivalence to the two texts being completely dissimilar to each other with intermediate values capturing specifically defined levels of incompletely overlapping similarity. While prior evaluations constrained themselves to just monolingual snippets of text, the 2016 shared task includes a pilot sub-task on computing semantic similarity on cross-lingual text snippets. This year's traditional monolingual sub-task includes the evaluation of English text snippets from the following four domains: Plagiarism Detection, Post-Edited Machine Translations, Question-Answering, and News Article Headlines. From the question-answering domain we included both question-question and answer-answer pairs. The cross-lingual task provides paired English-Spanish text snippets drawn from the same sources as the monolingual data as well as independently sampled news data. The monolingual task attracted 42 participating teams producing 118 system submissions, while the cross-lingual pilot task attracted 24 teams submitting 26 systems.