Exploratory Study on the Impact of English Bias of Generative Large Language Models in Dutch and French

Ayla Rigouts Terryn, Miryam de Lhoneux


Abstract
The most widely used LLMs like GPT4 and Llama 2 are trained on large amounts of data, mostly in English but are still able to deal with non-English languages. This English bias leads to lower performance in other languages, especially low-resource ones. This paper studies the linguistic quality of LLMs in two non-English high-resource languages: Dutch and French, with a focus on the influence of English. We first construct a comparable corpus of text generated by humans versus LLMs (GPT-4, Zephyr, and GEITje) in the news domain. We proceed to annotate linguistic issues in the LLM-generated texts, obtaining high inter-annotator agreement, and analyse these annotated issues. We find a substantial influence of English for all models under all conditions: on average, 16% of all annotations of linguistic errors or peculiarities had a clear link to English. Fine-tuning a LLM to a target language (GEITje is fine-tuned on Dutch) reduces the number of linguistic issues and probably also the influence of English. We further find that using a more elaborate prompt leads to linguistically better results than a concise prompt. Finally, increasing the temperature for one of the models leads to lower linguistic quality but does not alter the influence of English.
Anthology ID:
2024.humeval-1.2
Volume:
Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Simone Balloccu, Anya Belz, Rudali Huidrom, Ehud Reiter, Joao Sedoc, Craig Thomson
Venues:
HumEval | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
12–27
Language:
URL:
https://aclanthology.org/2024.humeval-1.2
DOI:
Bibkey:
Cite (ACL):
Ayla Rigouts Terryn and Miryam de Lhoneux. 2024. Exploratory Study on the Impact of English Bias of Generative Large Language Models in Dutch and French. In Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024, pages 12–27, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Exploratory Study on the Impact of English Bias of Generative Large Language Models in Dutch and French (Rigouts Terryn & de Lhoneux, HumEval-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.humeval-1.2.pdf