Annotation as Cultural Interpretation: Rethinking Data Labeling in NLP

Wajdi Zaghouani


Abstract
Human annotation is a foundational component of modern natural language processing (NLP). Labeled datasets underpin widely used benchmarks for sentiment analysis, toxicity detection, hate speech classification, and stance detection. Within standard NLP workflows, annotation is generally treated as a technical process aimed at recovering an objective ground truth according to predefined guidelines. This paper argues that such a view overlooks the inherently interpretive nature of annotation. Drawing on insights from sociolinguistics, discourse analysis, and cultural theory, and on a growing empirical literature on annotator subjectivity, we propose that annotation should be understood as a culturally situated interpretive practice. Annotators rely on culturally shaped norms, values, and communicative expectations when interpreting linguistic meaning, and labels in NLP datasets often reflect culturally specific interpretations rather than universal truths. We position this argument relative to recent work on perspectivism, annotator-aware modeling, and cross-cultural annotation, and we use published findings from large-scale cross-cultural annotation studies to illustrate the concrete consequences of treating annotation as objective. We close with a research agenda for culturally informed annotation practice that includes operational recommendations on documentation, modeling, and evaluation.
Anthology ID:
2026.c3nlp-1.1
Volume:
Proceedings of the 4th Workshop on Cross-Cultural Considerations in NLP (C3NLP 2026)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Vinodkumar Prabhakaran, Sunipa Dev, Luciana Benotti, Daniel Hershcovich, Yong Cao, Li Zhou, BOlei Ma, Ife Adebara
Venues:
C3NLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–10
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.c3nlp-1.1/
DOI:
Bibkey:
Cite (ACL):
Wajdi Zaghouani. 2026. Annotation as Cultural Interpretation: Rethinking Data Labeling in NLP. In Proceedings of the 4th Workshop on Cross-Cultural Considerations in NLP (C3NLP 2026), pages 1–10, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Annotation as Cultural Interpretation: Rethinking Data Labeling in NLP (Zaghouani, C3NLP 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.c3nlp-1.1.pdf