SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation
Abstract
Semantic Textual Similarity (STS) is the degree of semantic equivalence between two snippets of text. Similarity is expressed on an ordinal scale that spans from semantic equivalence to the two texts being completely dissimilar to each other with intermediate values capturing specifically defined levels of incompletely overlapping similarity. While prior evaluations constrained themselves to just monolingual snippets of text, the 2016 shared task includes a pilot sub-task on computing semantic similarity on cross-lingual text snippets. This year's traditional monolingual sub-task includes the evaluation of English text snippets from the following four domains: Plagiarism Detection, Post-Edited Machine Translations, Question-Answering, and News Article Headlines. From the question-answering domain we included both question-question and answer-answer pairs. The cross-lingual task provides paired English-Spanish text snippets drawn from the same sources as the monolingual data as well as independently sampled news data. The monolingual task attracted 42 participating teams producing 118 system submissions, while the cross-lingual pilot task attracted 24 teams submitting 26 systems.