Detecting Structured Language Alternations in Historical Documents by Combining Language Identification with Fourier Analysis

Hale Sirin, Sabrina Li, Thomas Lippincott


Abstract
In this study, we present a generalizable workflow to identify documents in a historic language with a nonstandard language and script combination, Armeno-Turkish. We introduce the task of detecting distinct patterns of multilinguality based on the frequency of structured language alternations within a document.
Anthology ID:
2024.latechclfl-1.6
Volume:
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)
Month:
March
Year:
2024
Address:
St. Julians, Malta
Editors:
Yuri Bizzoni, Stefania Degaetano-Ortlieb, Anna Kazantseva, Stan Szpakowicz
Venues:
LaTeCHCLfL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
46–50
Language:
URL:
https://aclanthology.org/2024.latechclfl-1.6
DOI:
Bibkey:
Cite (ACL):
Hale Sirin, Sabrina Li, and Thomas Lippincott. 2024. Detecting Structured Language Alternations in Historical Documents by Combining Language Identification with Fourier Analysis. In Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), pages 46–50, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):
Detecting Structured Language Alternations in Historical Documents by Combining Language Identification with Fourier Analysis (Sirin et al., LaTeCHCLfL-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/2024.latechclfl-1.6.pdf
Supplementary material:
 2024.latechclfl-1.6.SupplementaryMaterial.zip
Video:
 https://preview.aclanthology.org/dois-2013-emnlp/2024.latechclfl-1.6.mp4