Word Segmentation for Akkadian Cuneiform

Timo Homburg, Christian Chiarcos


Abstract
We present experiments on word segmentation for Akkadian cuneiform, an ancient writing system and a language used for about 3 millennia in the ancient Near East. To our best knowledge, this is the first study of this kind applied to either the Akkadian language or the cuneiform writing system. As a logosyllabic writing system, cuneiform structurally resembles Eastern Asian writing systems, so, we employ word segmentation algorithms originally developed for Chinese and Japanese. We describe results of rule-based algorithms, dictionary-based algorithms, statistical and machine learning approaches. Our results may indicate possible promising steps in cuneiform word segmentation that can create and improve natural language processing in this area.
Anthology ID:
L16-1642
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4067–4074
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/L16-1642/
DOI:
Bibkey:
Cite (ACL):
Timo Homburg and Christian Chiarcos. 2016. Word Segmentation for Akkadian Cuneiform. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4067–4074, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Word Segmentation for Akkadian Cuneiform (Homburg & Chiarcos, LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/L16-1642.pdf