Emma Yavasan


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
From Clay to Code: Transforming Hittite Texts for Machine Learning
Emma Yavasan | Shai Gordin
Proceedings of the Second Workshop on Ancient Language Processing

This paper presents a comprehensive method-ology for transforming XML-encoded Hittite cuneiform texts into computationally accessi-ble formats for machine learning applications. Drawing from a corpus of 8,898 texts (558,349 tokens in total) encompassing 145 cataloged genres and compositions, we develop a struc-tured approach to preserve both linguistic and philological annotations while enabling compu-tational analysis. Our methodology addresses key challenges in ancient language processing, including the handling of fragmentary texts, multiple language layers, and complex anno-tation systems. We demonstrate the applica-tion of our corpus through experiments with T5 models, achieving significant improvements in Hittite-to-German translation (ROUGE-1: 0.895) while identifying limitations in morpho-logical glossing tasks. This work establishes a standardized, machine-readable dataset in Hit-tite cuneiform, which also maintains a balance with philological accuracy and current state-of-the-art.