Abstract
Semantic textual similarity (STS) systems estimate the degree of the meaning similarity between two sentences. Cross-lingual STS systems estimate the degree of the meaning similarity between two sentences, each in a different language. State-of-the-art algorithms usually employ a strongly supervised, resource-rich approach difficult to use for poorly-resourced languages. However, any approach needs to have evaluation data to confirm the results. In order to simplify the evaluation process for poorly-resourced languages (in terms of STS evaluation datasets), we present new datasets for cross-lingual and monolingual STS for languages without this evaluation data. We also present the results of several state-of-the-art methods on these data which can be used as a baseline for further research. We believe that this article will not only extend the current STS research to other languages, but will also encourage competition on this new evaluation data.- Anthology ID:
- 2021.ranlp-1.59
- Volume:
- Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
- Month:
- September
- Year:
- 2021
- Address:
- Held Online
- Editors:
- Ruslan Mitkov, Galia Angelova
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 524–529
- Language:
- URL:
- https://aclanthology.org/2021.ranlp-1.59
- DOI:
- Cite (ACL):
- Tomáš Hercig and Pavel Kral. 2021. Evaluation Datasets for Cross-lingual Semantic Textual Similarity. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 524–529, Held Online. INCOMA Ltd..
- Cite (Informal):
- Evaluation Datasets for Cross-lingual Semantic Textual Similarity (Hercig & Kral, RANLP 2021)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/2021.ranlp-1.59.pdf