Meaning Variation and Data Quality in the Corpus of Founding Era American English

Dallas Card

Meaning Variation and Data Quality in the Corpus of Founding Era American English

Abstract

Legal scholars are increasingly using corpus based methods for assessing historical meaning. Among work focused on the so-called founding era (mid to late 18th century), the majority of such studies use the Corpus of Founding Era American English (COFEA) and rely on methods such as word counting and manual coding. Here, we demonstrate what can be inferred about meaning change and variation using more advanced NLP methods, focusing on terms in the U.S. Constitution. We also carry out a data quality assessment of COFEA, pointing out issues with OCR quality and metadata, compare diachronic change to synchronic variation, and discuss limitations when using NLP methods for studying historical meaning.

Anthology ID:: 2025.acl-short.66
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 841–856
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.acl-short.66/
DOI:
Bibkey:
Cite (ACL):: Dallas Card. 2025. Meaning Variation and Data Quality in the Corpus of Founding Era American English. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 841–856, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Meaning Variation and Data Quality in the Corpus of Founding Era American English (Card, ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.acl-short.66.pdf

PDF Cite Search Fix data