TartuNLP @ SIGTYP 2024 Shared Task: Adapting XLM-RoBERTa for Ancient and Historical Languages

Aleksei Dorkin; Kairit Sirts

TartuNLP @ SIGTYP 2024 Shared Task: Adapting XLM-RoBERTa for Ancient and Historical Languages

Abstract

We present our submission to the unconstrained subtask of the SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages for morphological annotation, POS-tagging, lemmatization, characterand word-level gap-filling. We developed a simple, uniform, and computationally lightweight approach based on the adapters framework using parameter-efficient fine-tuning. We applied the same adapter-based approach uniformly to all tasks and 16 languages by fine-tuning stacked language- and task-specific adapters. Our submission obtained an overall second place out of three submissions, with the first place in word-level gap-filling. Our results show the feasibility of adapting language models pre-trained on modern languages to historical and ancient languages via adapter training.

Anthology ID:: 2024.sigtyp-1.15
Volume:: Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:: March
Year:: 2024
Address:: St. Julian's, Malta
Editors:: Michael Hahn, Alexey Sorokin, Ritesh Kumar, Andreas Shcherbakov, Yulia Otmakhova, Jinrui Yang, Oleg Serikov, Priya Rani, Edoardo M. Ponti, Saliha Muradoğlu, Rena Gao, Ryan Cotterell, Ekaterina Vylomova
Venues:: SIGTYP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 120–130
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.sigtyp-1.15/
DOI:
Bibkey:
Cite (ACL):: Aleksei Dorkin and Kairit Sirts. 2024. TartuNLP @ SIGTYP 2024 Shared Task: Adapting XLM-RoBERTa for Ancient and Historical Languages. In Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 120–130, St. Julian's, Malta. Association for Computational Linguistics.
Cite (Informal):: TartuNLP @ SIGTYP 2024 Shared Task: Adapting XLM-RoBERTa for Ancient and Historical Languages (Dorkin & Sirts, SIGTYP 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.sigtyp-1.15.pdf

PDF Cite Search Fix data