QFrCoLA: a Quebec-French Corpus of Linguistic Acceptability Judgments

David Beauchemin; Richard Khoury

QFrCoLA: a Quebec-French Corpus of Linguistic Acceptability Judgments

Abstract

Large and Transformer-based language models perform outstandingly in various downstream tasks. However, there is limited understanding regarding how these models internalize linguistic knowledge, so various linguistic benchmarks have recently been proposed to facilitate syntactic evaluation of language models across languages. This paper introduces QFrCoLA (Quebec-French Corpus of Linguistic Acceptability Judgments), a normative binary acceptability judgments dataset comprising 25,153 in-domain and 2,675 out-of-domain sentences. Our study leverages the QFrCoLA dataset and seven other linguistic binary acceptability judgment corpora to benchmark seven language models. The results demonstrate that, on average, fine-tuned Transformer-based LM are strong baselines for most languages and that zero-shot binary classification large language models perform poorly on the task. However, for the QFrCoLA benchmark, on average, a fine-tuned Transformer-based LM outperformed other methods tested. It also shows that pre-trained cross-lingual LLMs selected for our experimentation do not seem to have acquired linguistic judgment capabilities during their pre-training for Quebec French. Finally, our experiment results on QFrCoLA show that our dataset, built from examples that illustrate linguistic norms rather than speakers’ feelings, is similar to linguistic acceptability judgment; it is a challenging dataset that can benchmark LM on their linguistic judgment capabilities.

Anthology ID:: 2025.emnlp-main.6
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 119–130
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.6/
DOI:
Bibkey:
Cite (ACL):: David Beauchemin and Richard Khoury. 2025. QFrCoLA: a Quebec-French Corpus of Linguistic Acceptability Judgments. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 119–130, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: QFrCoLA: a Quebec-French Corpus of Linguistic Acceptability Judgments (Beauchemin & Khoury, EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.6.pdf
Checklist:: 2025.emnlp-main.6.checklist.pdf

PDF Cite Search Checklist Fix data