Translated Benchmarks Can Be Misleading: the Case of Estonian Question Answering
Abstract
Translated test datasets are a popular and cheaper alternative to native test datasets. However, one of the properties of translated data is the existence of cultural knowledge unfamiliar to the target language speakers. This can make translated test datasets differ significantly from native target datasets. As a result, we might inaccurately estimate the performance of the models in the target language. In this paper, we use both native and translated Estonian QA datasets to study this topic more closely. We discover that relying on the translated test dataset results in an overestimation of the model’s performance on native Estonian data.- Anthology ID:
- 2023.nodalida-1.71
- Volume:
- Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
- Month:
- May
- Year:
- 2023
- Address:
- Tórshavn, Faroe Islands
- Editors:
- Tanel Alumäe, Mark Fishel
- Venue:
- NoDaLiDa
- SIG:
- Publisher:
- University of Tartu Library
- Note:
- Pages:
- 710–716
- Language:
- URL:
- https://aclanthology.org/2023.nodalida-1.71
- DOI:
- Cite (ACL):
- Hele-Andra Kuulmets and Mark Fishel. 2023. Translated Benchmarks Can Be Misleading: the Case of Estonian Question Answering. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 710–716, Tórshavn, Faroe Islands. University of Tartu Library.
- Cite (Informal):
- Translated Benchmarks Can Be Misleading: the Case of Estonian Question Answering (Kuulmets & Fishel, NoDaLiDa 2023)
- PDF:
- https://preview.aclanthology.org/teach-a-man-to-fish/2023.nodalida-1.71.pdf