Alignment of bilingual named entities in parallel corpora using statistical model

Chun-Jen Lee, Jason S. Chang, Thomas C. Chuang


Abstract
Named entities make up a bulk of documents. Extracting named entities is crucial to various applications of natural language processing. Although efforts to identify named entities within monolingual documents are numerous, extracting bilingual named entities has not been investigated extensively owing to the complexity of the task. In this paper, we describe a statistical phrase translation model and a statistical transliteration model. Under the proposed models, a new method is proposed to align bilingual named entities in parallel corpora. Experimental results indicate that a satisfactory precision rate can be achieved. To enhance the performance, we also describe how to improve the proposed method by incorporating approximate matching and person name recognition. Experimental results show that performance is significantly improved with the enhancement.
Anthology ID:
2004.amta-papers.17
Volume:
Proceedings of the 6th Conference of the Association for Machine Translation in the Americas: Technical Papers
Month:
September 28 - October 2
Year:
2004
Address:
Washington, USA
Venue:
AMTA
SIG:
Publisher:
Springer
Note:
Pages:
144–153
Language:
URL:
https://link.springer.com/chapter/10.1007/978-3-540-30194-3_17
DOI:
Bibkey:
Cite (ACL):
Chun-Jen Lee, Jason S. Chang, and Thomas C. Chuang. 2004. Alignment of bilingual named entities in parallel corpora using statistical model. In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 144–153, Washington, USA. Springer.
Cite (Informal):
Alignment of bilingual named entities in parallel corpora using statistical model (Lee et al., AMTA 2004)
Copy Citation:
PDF:
https://link.springer.com/chapter/10.1007/978-3-540-30194-3_17