OversampledML at SemEval-2022 Task 8: When multilingual news similarity met Zero-shot approaches

Mayank Jobanputra, Lorena Martín Rodríguez


Abstract
We investigate the capabilities of pre-trained models, without any fine-tuning, for a document-level multilingual news similarity task of SemEval-2022. We utilize title and news content with appropriate pre-processing techniques. Our system derives 14 different similarity features using a combination of state-of-the-art methods (MPNet) with well-known statistical methods (i.e. TF-IDF, Word Mover’s distance). We formulate multilingual news similarity task as a regression task and approximate the overall similarity between two news articles using these features. Our best-performing system achieved a correlation score of 70.1% and was ranked 20th among the 34 participating teams. In this paper, in addition to a system description, we also provide further analysis of our results and an ablation study highlighting the strengths and limitations of our features. We make our code publicly available at https://github.com/cicl-iscl/multinewssimilarity
Anthology ID:
2022.semeval-1.165
Volume:
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Month:
July
Year:
2022
Address:
Seattle, United States
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
1171–1177
Language:
URL:
https://aclanthology.org/2022.semeval-1.165
DOI:
10.18653/v1/2022.semeval-1.165
Bibkey:
Cite (ACL):
Mayank Jobanputra and Lorena Martín Rodríguez. 2022. OversampledML at SemEval-2022 Task 8: When multilingual news similarity met Zero-shot approaches. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 1171–1177, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
OversampledML at SemEval-2022 Task 8: When multilingual news similarity met Zero-shot approaches (Jobanputra & Martín Rodríguez, SemEval 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2022.semeval-1.165.pdf
Video:
 https://preview.aclanthology.org/emnlp-22-attachments/2022.semeval-1.165.mp4
Code
 cicl-iscl/multinewssimilarity