Henok Biadglign Ademtew
2025
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding
Israel Abebe Azime
|
Atnafu Lambebo Tonja
|
Tadesse Destaw Belay
|
Yonas Chanie
|
Bontu Fufa Balcha
|
Negasi Haile Abadi
|
Henok Biadglign Ademtew
|
Mulubrhan Abebe Nerea
|
Debela Desalegn Yadeta
|
Derartu Dagne Geremew
|
Assefa Atsbiha Tesfu
|
Philipp Slusallek
|
Thamar Solorio
|
Dietrich Klakow
Findings of the Association for Computational Linguistics: NAACL 2025
2024
AGE: Amharic, Ge’ez and English Parallel Dataset
Henok Biadglign Ademtew
|
Mikiyas Girma Birbo
Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024)
African languages are not well-represented in Natural Language Processing (NLP). The main reason is a lack of resources for training models. Low-resource languages, such as Amharic and Ge’ez, cannot benefit from modern NLP methods because of the lack of high-quality datasets. This paper presents AGE, an open-source tripartite alignment of Amharic, Ge’ez, and English parallel dataset. Additionally, we introduced a novel, 1,000 Ge’ez-centered sentences sourced from areas such as news and novels. Furthermore, we developed a model from a multilingual pre-trained language model, which brings 12.29 and 30.66 for English-Ge’ez and Ge’ez to English, respectively, and 9.39 and 12.29 for Amharic-Ge’ez and Ge’ez-Amharic respectively.