How Much Noise Can BERT Handle? Insights from Multilingual Sentence Difficulty Detection

Nouran Khallaf; Serge Sharoff

How Much Noise Can BERT Handle? Insights from Multilingual Sentence Difficulty Detection

Abstract

Noisy training data can significantly degrade the performance of language-model-based classifiers, particularly in non-topical classification tasks. This study explores a range of denoising strategies for sentence-level difficulty detection, using training data derived from document-level difficulty annotations obtained through noisy crowdsourcing. Beyond monolingual settings, we also address cross-lingual transfer, where a multilingual language model is trained in one language and tested in another. We evaluate several noise reduction techniques, including Gaussian Mixture Models (GMM), Co-Teaching, Noise Transition Matrices, and Label Smoothing. Our results indicate that while BERT-based models exhibit inherent robustness to noise, incorporating explicit noise detection can further enhance performance. For our smaller dataset, GMM-based noise filtering proves particularly effective in improving prediction quality by raising the AUC score from 0.52 to 0.86, or to 0.92 when two de-noising methods are combined (GMM and Co-Teaching). However, for our larger dataset, the intrinsic regularisation of pre-trained language models provides a strong baseline, with denoising methods yielding only marginal gains (from 0.8948 to 0.8984, or to 0.9061 when two denoising methods are combined). Nonetheless, removing noisy sentences (about 20% of the dataset) helps in producing a cleaner corpus with fewer infelicities. As a result we have released the largest available multilingual corpus for sentence difficulty prediction.

Anthology ID:: 2026.lrec-main.485
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 6132–6143
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.485/
DOI:
Bibkey:
Cite (ACL):: Nouran Khallaf and Serge Sharoff. 2026. How Much Noise Can BERT Handle? Insights from Multilingual Sentence Difficulty Detection. International Conference on Language Resources and Evaluation, main:6132–6143.
Cite (Informal):: How Much Noise Can BERT Handle? Insights from Multilingual Sentence Difficulty Detection (Khallaf & Sharoff, LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.485.pdf

PDF Cite Search Fix data