Allen Institute for AI @ SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages

Lester James V. Miranda

Allen Institute for AI @ SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages

Abstract

In this paper, we describe Allen AI’s submission to the constrained track of the SIGTYP 2024 Shared Task. Using only the data provided by the organizers, we pretrained a transformer-based multilingual model, then finetuned it on the Universal Dependencies (UD) annotations of a given language for a downstream task. Our systems achieved decent performance on the test set, beating the baseline in most language-task pairs, yet struggles with subtoken tags in multiword expressions as seen in Coptic and Ancient Hebrew. On the validation set, we obtained ≥70% F1- score on most language-task pairs. In addition, we also explored the cross-lingual capability of our trained models. This paper highlights our pretraining and finetuning process, and our findings from our internal evaluations.

Anthology ID:: 2024.sigtyp-1.18
Volume:: Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:: March
Year:: 2024
Address:: St. Julian's, Malta
Editors:: Michael Hahn, Alexey Sorokin, Ritesh Kumar, Andreas Shcherbakov, Yulia Otmakhova, Jinrui Yang, Oleg Serikov, Priya Rani, Edoardo M. Ponti, Saliha Muradoğlu, Rena Gao, Ryan Cotterell, Ekaterina Vylomova
Venues:: SIGTYP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 151–159
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2024.sigtyp-1.18/
DOI:
Bibkey:
Cite (ACL):: Lester James V. Miranda. 2024. Allen Institute for AI @ SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages. In Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 151–159, St. Julian's, Malta. Association for Computational Linguistics.
Cite (Informal):: Allen Institute for AI @ SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages (Miranda, SIGTYP 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2024.sigtyp-1.18.pdf
Video:: https://preview.aclanthology.org/fix-sig-urls/2024.sigtyp-1.18.mp4

PDF Cite Search Video Fix data