Stylistic Transfer from Annotator Communities to Large Language Models

Jay Chooi


Abstract
Large language models (LLMs) are post-trained on human feedback collected from annotator communities, yet the linguistic influence of these annotator communities on language models remains poorly understood. We investigated the stylistic transfer from Nigerian annotators to the LLaMA family of models through a natural experiment with LLaMA 2 and LLaMA 3.1, as their release dates are separated by the shutdown of a major data annotation service provider in Nigeria. We generated corpora from both model families and measured linguistic style by computing the difference-in-difference of the Jensen-Shannon distance on the bigram distribution between model outputs and corpora of Nigerian English and US English. We found that, although both pre-trained model variants exhibit similar proximity to both English variants, the LLaMA 2 post-trained model moved toward Nigerian English, while the LLaMA 3.1 post-trained model moved away from Nigerian English. Qualitatively, we found that post-trained LLaMA 2 models used significantly fewer contractions, in line with Nigerian English speakers opting to use a formal register due to its role as an index of knowledgeability. Our findings suggest that annotator communities can imprint linguistic style on large language models, with potential implications such as a disproportionately higher false positive rate in AI plagiarism detection for users who share a linguistic style with annotator communities.
Anthology ID:
2026.latechclfl-1.13
Volume:
Proceedings of the 10th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Diego Alves, Yuri Bizzoni, Stefania Degaetano-Ortlieb, Anna Kazantseva, Janis Pagel, Stan Szpakowicz
Venues:
LaTeCH-CLfL | WS
SIG:
SIGHUM
Publisher:
Association for Computational Linguistics
Note:
Pages:
135–145
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.latechclfl-1.13/
DOI:
Bibkey:
Cite (ACL):
Jay Chooi. 2026. Stylistic Transfer from Annotator Communities to Large Language Models. In Proceedings of the 10th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature 2026, pages 135–145, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Stylistic Transfer from Annotator Communities to Large Language Models (Chooi, LaTeCH-CLfL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.latechclfl-1.13.pdf
Supplementarymaterial:
 2026.latechclfl-1.13.SupplementaryMaterial.zip
Supplementarymaterial:
 2026.latechclfl-1.13.SupplementaryMaterial.zip