Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction

Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, Barbara Plank

[How to correct problems with metadata yourself]


Abstract
Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources. We propose Multi-CrossRE, the broadest multi-lingual dataset for RE, including 26 languages in addition to English, and covering six text domains. Multi-CrossRE is a machine translated version of CrossRE (Bassignana and Plank, 2022), with a sub-portion including more than 200 sentences in seven diverse languages checked by native speakers. We run a baseline model over the 26 new datasets and–as sanity check–over the 26 back-translations to English. Results on the back-translated data are consistent with the ones on the original English CrossRE, indicating high quality of the translation and the resulting dataset.
Anthology ID:
2023.nodalida-1.9
Volume:
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May
Year:
2023
Address:
Tórshavn, Faroe Islands
Editors:
Tanel Alumäe, Mark Fishel
Venue:
NoDaLiDa
SIG:
Publisher:
University of Tartu Library
Note:
Pages:
80–85
Language:
URL:
https://aclanthology.org/2023.nodalida-1.9
DOI:
Bibkey:
Cite (ACL):
Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, and Barbara Plank. 2023. Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 80–85, Tórshavn, Faroe Islands. University of Tartu Library.
Cite (Informal):
Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction (Bassignana et al., NoDaLiDa 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/teach-a-man-to-fish/2023.nodalida-1.9.pdf