Language and Dialect Identification of Cuneiform Texts
Tommi Jauhiainen, Heidi Jauhiainen, Tero Alstola, Krister Lindén
Abstract
This article introduces a corpus of cuneiform texts from which the dataset for the use of the Cuneiform Language Identification (CLI) 2019 shared task was derived as well as some preliminary language identification experiments conducted using that corpus. We also describe the CLI dataset and how it was derived from the corpus. In addition, we provide some baseline language identification results using the CLI dataset. To the best of our knowledge, the experiments detailed here represent the first time that automatic language identification methods have been used on cuneiform data.- Anthology ID:
- W19-1409
- Volume:
- Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
- Month:
- June
- Year:
- 2019
- Address:
- Ann Arbor, Michigan
- Editors:
- Marcos Zampieri, Preslav Nakov, Shervin Malmasi, Nikola Ljubešić, Jörg Tiedemann, Ahmed Ali
- Venue:
- VarDial
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 89–98
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/W19-1409/
- DOI:
- 10.18653/v1/W19-1409
- Cite (ACL):
- Tommi Jauhiainen, Heidi Jauhiainen, Tero Alstola, and Krister Lindén. 2019. Language and Dialect Identification of Cuneiform Texts. In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 89–98, Ann Arbor, Michigan. Association for Computational Linguistics.
- Cite (Informal):
- Language and Dialect Identification of Cuneiform Texts (Jauhiainen et al., VarDial 2019)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/W19-1409.pdf