Team dina at SemEval-2022 Task 8: Pre-trained Language Models as Baselines for Semantic Similarity

Dina Pisarevskaya, Arkaitz Zubiaga


Abstract
This paper describes the participation of the team “dina” in the Multilingual News Similarity task at SemEval 2022. To build our system for the task, we experimented with several multilingual language models which were originally pre-trained for semantic similarity but were not further fine-tuned. We use these models in combination with state-of-the-art packages for machine translation and named entity recognition with the expectation of providing valuable input to the model. Our work assesses the applicability of such “pure” models to solve the multilingual semantic similarity task in the case of news articles. Our best model achieved a score of 0.511, but shows that there is room for improvement.
Anthology ID:
2022.semeval-1.169
Volume:
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Month:
July
Year:
2022
Address:
Seattle, United States
Venue:
SemEval
SIGs:
SIGLEX | SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
1196–1201
Language:
URL:
https://aclanthology.org/2022.semeval-1.169
DOI:
10.18653/v1/2022.semeval-1.169
Bibkey:
Cite (ACL):
Dina Pisarevskaya and Arkaitz Zubiaga. 2022. Team dina at SemEval-2022 Task 8: Pre-trained Language Models as Baselines for Semantic Similarity. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 1196–1201, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Team dina at SemEval-2022 Task 8: Pre-trained Language Models as Baselines for Semantic Similarity (Pisarevskaya & Zubiaga, SemEval 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2022.semeval-1.169.pdf