PEAR at SemEval-2024 Task 1: Pair Encoding with Augmented Re-sampling for Semantic Textual Relatedness

Tollef Jørgensen


Abstract
This paper describes a system submitted to the supervised track (Track A) at SemEval-24: Semantic Textual Relatedness for African and Asian Languages. Challenged with datasets of varying sizes, some as small as 800 samples, we observe that the PEAR system, using smaller pre-trained masked language models to process sentence pairs (Pair Encoding), results in models that efficiently adapt to the task.In addition to the simplistic modeling approach, we experiment with hyperparameter optimization and data expansion from the provided training sets using multilingual bi-encoders, sampling a dynamic number of nearest neighbors (Augmented Re-sampling). The final models are lightweight, allowing fast experimentation and integration of new languages.
Anthology ID:
2024.semeval-1.202
Volume:
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
1405–1411
Language:
URL:
https://aclanthology.org/2024.semeval-1.202
DOI:
Bibkey:
Cite (ACL):
Tollef Jørgensen. 2024. PEAR at SemEval-2024 Task 1: Pair Encoding with Augmented Re-sampling for Semantic Textual Relatedness. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1405–1411, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
PEAR at SemEval-2024 Task 1: Pair Encoding with Augmented Re-sampling for Semantic Textual Relatedness (Jørgensen, SemEval 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-checklist/2024.semeval-1.202.pdf
Supplementary material:
 2024.semeval-1.202.SupplementaryMaterial.txt
Supplementary material:
 2024.semeval-1.202.SupplementaryMaterial.zip