Abstract
In this study, we present a generalizable workflow to identify documents in a historic language with a nonstandard language and script combination, Armeno-Turkish. We introduce the task of detecting distinct patterns of multilinguality based on the frequency of structured language alternations within a document.- Anthology ID:
- 2024.latechclfl-1.6
- Volume:
- Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)
- Month:
- March
- Year:
- 2024
- Address:
- St. Julians, Malta
- Editors:
- Yuri Bizzoni, Stefania Degaetano-Ortlieb, Anna Kazantseva, Stan Szpakowicz
- Venues:
- LaTeCHCLfL | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 46–50
- Language:
- URL:
- https://aclanthology.org/2024.latechclfl-1.6
- DOI:
- Cite (ACL):
- Hale Sirin, Sabrina Li, and Thomas Lippincott. 2024. Detecting Structured Language Alternations in Historical Documents by Combining Language Identification with Fourier Analysis. In Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), pages 46–50, St. Julians, Malta. Association for Computational Linguistics.
- Cite (Informal):
- Detecting Structured Language Alternations in Historical Documents by Combining Language Identification with Fourier Analysis (Sirin et al., LaTeCHCLfL-WS 2024)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/2024.latechclfl-1.6.pdf