CogALex-V Shared Task: GHHH - Detecting Semantic Relations via Word Embeddings
Abstract
This paper describes our system submitted to the CogALex-2016 Shared Task on the Corpus-Based Identification of Semantic Relations. The evaluation results of our system on the test set are 88.1\% (79.0\% for TRUE only) f-measure for Task-1 on detecting semantic similarity, and 76.0\% (42.3\% when excluding RANDOM) for Task-2 on identifying more finer grained semantic relations. In our experiments, we try word analogy, linear regression, and multi-task Convolutional Neural Networks (CNN) with word embeddings from publicly available word vectors. We found that linear regression performs better in binary classification (Task-1), while CNN has better performance in multi-class semantic classification (Task-2).
We assume that word analogy is more suited for deterministic answers rather than handling the ambiguity of one-to-many and many-to-many relationships. We also show that classifier performance could benefit from balancing the frequency of labels in the training data.
We assume that word analogy is more suited for deterministic answers rather than handling the ambiguity of one-to-many and many-to-many relationships. We also show that classifier performance could benefit from balancing the frequency of labels in the training data.