Towards the First Named Entity Recognition of Inuktitut for an Improved Machine Translation

Ngoc Tan Le, Soumia Kasdi, Fatiha Sadat


Abstract
Named Entity Recognition is a crucial step to ensure good quality performance of several Natural Language Processing applications and tools, including machine translation and information retrieval. Moreover, it is considered as a fundamental module of many Natural Language Understanding tasks such as question-answering systems. This paper presents a first study on NER for an under-represented Indigenous Inuit language of Canada, Inuktitut, which lacks linguistic resources and large labeled data. Our proposed NER model for Inuktitut is built by transferring linguistic characteristics from English to Inuktitut, based on either rules or bilingual word embeddings. We provide an empirical study based on a comparison with the state of the art models and as well as intrinsic and extrinsic evaluations. In terms of Recall, Precision and F-score, the obtained results show the effectiveness of the proposed NER methods. Furthermore, it improved the performance of Inuktitut-English Neural Machine Translation.
Anthology ID:
2023.americasnlp-1.10
Volume:
Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Manuel Mager, Abteen Ebrahimi, Arturo Oncevay, Enora Rice, Shruti Rijhwani, Alexis Palmer, Katharina Kann
Venue:
AmericasNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
84–93
Language:
URL:
https://aclanthology.org/2023.americasnlp-1.10
DOI:
10.18653/v1/2023.americasnlp-1.10
Bibkey:
Cite (ACL):
Ngoc Tan Le, Soumia Kasdi, and Fatiha Sadat. 2023. Towards the First Named Entity Recognition of Inuktitut for an Improved Machine Translation. In Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP), pages 84–93, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Towards the First Named Entity Recognition of Inuktitut for an Improved Machine Translation (Le et al., AmericasNLP 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2023.americasnlp-1.10.pdf