Cross-Lingual Named Entity Recognition via FastAlign: a Case Study

Ali Hatami, Ruslan Mitkov, Gloria Corpas Pastor


Abstract
Named Entity Recognition is an essential task in natural language processing to detect entities and classify them into predetermined categories. An entity is a meaningful word, or phrase that refers to proper nouns. Named Entities play an important role in different NLP tasks such as Information Extraction, Question Answering and Machine Translation. In Machine Translation, named entities often cause translation failures regardless of local context, affecting the output quality of translation. Annotating named entities is a time-consuming and expensive process especially for low-resource languages. One solution for this problem is to use word alignment methods in bilingual parallel corpora in which just one side has been annotated. The goal is to extract named entities in the target language by using the annotated corpus of the source language. In this paper, we compare the performance of two alignment methods, Grow-diag-final-and and Intersect Symmetrisation heuristics, to exploit the annotation projection of English-Brazilian Portuguese bilingual corpus to detect named entities in Brazilian Portuguese. A NER model that is trained on annotated data extracted from the alignment methods, is used to evaluate the performance of aligners. Experimental results show the Intersect Symmetrisation is able to achieve superior performance scores compared to the Grow-diag-final-and heuristic in Brazilian Portuguese.
Anthology ID:
2021.triton-1.10
Volume:
Proceedings of the Translation and Interpreting Technology Online Conference
Month:
July
Year:
2021
Address:
Held Online
Editors:
Ruslan Mitkov, Vilelmini Sosoni, Julie Christine Giguère, Elena Murgolo, Elizabeth Deysel
Venue:
TRITON
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
85–92
Language:
URL:
https://aclanthology.org/2021.triton-1.10
DOI:
Bibkey:
Cite (ACL):
Ali Hatami, Ruslan Mitkov, and Gloria Corpas Pastor. 2021. Cross-Lingual Named Entity Recognition via FastAlign: a Case Study. In Proceedings of the Translation and Interpreting Technology Online Conference, pages 85–92, Held Online. INCOMA Ltd..
Cite (Informal):
Cross-Lingual Named Entity Recognition via FastAlign: a Case Study (Hatami et al., TRITON 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2021.triton-1.10.pdf