WueDevils at SemEval-2022 Task 8: Multilingual News Article Similarity via Pair-Wise Sentence Similarity Matrices
Dirk Wangsadirdja, Felix Heinickel, Simon Trapp, Albin Zehe, Konstantin Kobs, Andreas Hotho
Abstract
We present a system that creates pair-wise cosine and arccosine sentence similarity matrices using multilingual sentence embeddings obtained from pre-trained SBERT and Universal Sentence Encoder (USE) models respectively. For each news article sentence, it searches the most similar sentence from the other article and computes an average score. Further, a convolutional neural network calculates a total similarity score for the article pairs on these matrices. Finally, a random forest regressor merges the previous results to a final score that can optionally be extended with a publishing date score.- Anthology ID:
- 2022.semeval-1.175
- Volume:
- Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, United States
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1235–1243
- Language:
- URL:
- https://aclanthology.org/2022.semeval-1.175
- DOI:
- 10.18653/v1/2022.semeval-1.175
- Cite (ACL):
- Dirk Wangsadirdja, Felix Heinickel, Simon Trapp, Albin Zehe, Konstantin Kobs, and Andreas Hotho. 2022. WueDevils at SemEval-2022 Task 8: Multilingual News Article Similarity via Pair-Wise Sentence Similarity Matrices. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 1235–1243, Seattle, United States. Association for Computational Linguistics.
- Cite (Informal):
- WueDevils at SemEval-2022 Task 8: Multilingual News Article Similarity via Pair-Wise Sentence Similarity Matrices (Wangsadirdja et al., SemEval 2022)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/2022.semeval-1.175.pdf