Tilahun Abedissa Taffa
2026
Amharic DBpedia Chapter: A Knowledge Graph for a Low-Resource Language
HIzkiel Mitiku Alemayehu | Tilahun Abedissa Taffa | Meti Adane Bayissa | Andargachew Asfaw Zewge | Hamada Zahera | Ricardo Usbeck | Axel-Cyrille Ngonga Ngomo
Proceedings of the Fifteenth Language Resources and Evaluation Conference
HIzkiel Mitiku Alemayehu | Tilahun Abedissa Taffa | Meti Adane Bayissa | Andargachew Asfaw Zewge | Hamada Zahera | Ricardo Usbeck | Axel-Cyrille Ngonga Ngomo
Proceedings of the Fifteenth Language Resources and Evaluation Conference
DBpedia is a community-driven project that extracts structured knowledge from Wikipedia via language-specific chapters. We present the first steps toward the Amharic DBpedia chapter by extending the DBpedia Extraction Framework (DEF) to support Amharic Wikipedia, including language-specific components such as Ethiopian date parsers, an Ethiopian–Gregorian calendar converter, an Arabic–Ge’ez number converter, and Amharic template mappings, together with automated extraction pipelines and the publication of the resulting knowledge graph through a live website, DBpedia Databus collection, and query endpoints. For mapping, we evaluate the zero-shot NLLB-200 translation model on Amharic infobox property names, achieving a BLEU score of 45.31. For ontology alignment, we link mapped properties to DBpedia ontology properties across 58 DBpedia classes and benchmark multilingual encoders with Amharic support, including Afro-XLM-R Base, XLM-R Base, and Amharic fine-tuned mBERT. The fine-tuned Afro-XLM-R model achieves 92.1% Top-10 accuracy and strong ranking performance, as measured by Mean Reciprocal Rank (MRR). We release all resources developed for the Amharic DBpedia chapter, including the Ethiopian date parser, Ethiopian–Gregorian calendar converter, Arabic–Geʽez numeral converter, Amharic template mappings, automated extraction workflows, and the resulting Amharic DBpedia knowledge graph with public access via the DBpedia Databus collection, Tentris query endpoint, and the live website at am.dbpedia.org.
2024
Low Resource Question Answering: An Amharic Benchmarking Dataset
Tilahun Abedissa Taffa | Ricardo Usbeck | Yaregal Assabie
Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024
Tilahun Abedissa Taffa | Ricardo Usbeck | Yaregal Assabie
Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024
Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best- performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.
2019
Amharic Question Answering for Biography, Definition, and Description Questions
Tilahun Abedissa Taffa | Mulugeta Libsie
Proceedings of the 2019 Workshop on Widening NLP
Tilahun Abedissa Taffa | Mulugeta Libsie
Proceedings of the 2019 Workshop on Widening NLP
A broad range of information needs can often be stated as a question. Question Answering (QA) systems attempt to provide users concise answer(s) to natural language questions. The existing Amharic QA systems handle fact-based questions that usually take named entities as an answer. To deal with more complex information needs we developed an Amharic non-factoid QA for biography, definition, and description questions. A hybrid approach has been used for the question classification. For document filtering and answer extraction we have used lexical patterns. On the other hand to answer biography questions we have used a summarizer and the generated summary is validated using a text classifier. Our QA system is evaluated and has shown a promising result.