A Dual-View Analysis of Multiple Languages in Colonial Newspapers

Zhan Su, Xiaoya Chen, Fengran Mo, Ida L. Vos, Prayag Tiwari, Yazhou Zhang, Qian Zheng, Nat\'alia da Silva Perez


Abstract
Historical newspapers from the colonial period offer valuable evidence of how racializing language evolved over time. However, there are challenges in studying this type of historical data: 1) Data scarcity: acquiring large, annotated historical datasets is difficult, hindering the possibility of analyzing racialization comprehensively; 2) Digitized materials frequently contain Optical Character Recognition (OCR) errors and other types of noise that complicate text extraction and computational analysis; 3) Colonial newspapers are often multilingual and written in archaic prose, hindering the effectiveness of NLP tools developed for modern, single language texts. This paper addresses these challenges by conducting a dual-view, jointly studying multilingual event extraction and temporal semantic shift tasks. Specifically, we introduce a contextual question answering (CQA) and a visual question answering (VQA) derived from eighteenth- and nineteenth-century colonial newspapers. Content-wise, we focus on how enslaved people were described by enslavers as well as how they articulated their own condition through QA pairs of newspapers written in Dutch, English-French, and Spanish. Our results show that LLMs are still limited for low-resource VQA tasks. For temporal semantic change, we train temporal word embedding with a compass. The study concludes that racialization is a fluid process of linguistic recalibration where the decline of slavery merely shifted the language of control onto new categories of labor and identity.
Anthology ID:
2026.findings-acl.1029
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20559–20573
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1029/
DOI:
Bibkey:
Cite (ACL):
Zhan Su, Xiaoya Chen, Fengran Mo, Ida L. Vos, Prayag Tiwari, Yazhou Zhang, Qian Zheng, and Nat\'alia da Silva Perez. 2026. A Dual-View Analysis of Multiple Languages in Colonial Newspapers. In Findings of the Association for Computational Linguistics: ACL 2026, pages 20559–20573, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
A Dual-View Analysis of Multiple Languages in Colonial Newspapers (Su et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1029.pdf
Checklist:
 2026.findings-acl.1029.checklist.pdf