Cheap Translation for Cross-Lingual Named Entity Recognition

Stephen Mayhew; Chen-Tse Tsai; Dan Roth

doi:10.18653/v1/D17-1269

Cheap Translation for Cross-Lingual Named Entity Recognition

Abstract

Recent work in NLP has attempted to deal with low-resource languages but still assumed a resource level that is not present for most languages, e.g., the availability of Wikipedia in the target language. We propose a simple method for cross-lingual named entity recognition (NER) that works well in settings with very minimal resources. Our approach makes use of a lexicon to “translate” annotated data available in one or several high resource language(s) into the target language, and learns a standard monolingual NER model there. Further, when Wikipedia is available in the target language, our method can enhance Wikipedia based methods to yield state-of-the-art NER results; we evaluate on 7 diverse languages, improving the state-of-the-art by an average of 5.5% F1 points. With the minimal resources required, this is an extremely portable cross-lingual NER approach, as illustrated using a truly low-resource language, Uyghur.

Anthology ID:: D17-1269
Volume:: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:: September
Year:: 2017
Address:: Copenhagen, Denmark
Venue:: EMNLP
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2536–2545
Language:
URL:: https://aclanthology.org/D17-1269
DOI:: 10.18653/v1/D17-1269
Bibkey:
Cite (ACL):: Stephen Mayhew, Chen-Tse Tsai, and Dan Roth. 2017. Cheap Translation for Cross-Lingual Named Entity Recognition. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2536–2545, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):: Cheap Translation for Cross-Lingual Named Entity Recognition (Mayhew et al., EMNLP 2017)
Copy Citation:
PDF:: https://preview.aclanthology.org/paclic-22-ingestion/D17-1269.pdf
Data: Panlex

PDF Search