Improving domain-specific word alignment with a general bilingual corpus

Hua Wu (吴华); Haifeng Wang

Improving domain-specific word alignment with a general bilingual corpus

Abstract

In conventional word alignment methods, some employ statistical models or statistical measures, which need large-scale bilingual sentence-aligned training corpora. Others employ dictionaries to guide alignment selection. However, these methods achieve unsatisfactory alignment results when performing word alignment on a small-scale domain-specific bilingual corpus without terminological lexicons. This paper proposes an approach to improve word alignment in a specific domain, in which only a small-scale domain-specific corpus is available, by adapting the word alignment information in the general domain to the specific domain. This approach first trains two statistical word alignment models with the large-scale corpus in the general domain and the small-scale corpus in the specific domain respectively, and then improves the domain-specific word alignment with these two models. Experimental results show a significant improvement in terms of both alignment precision and recall, achieving a relative error rate reduction of 21.96% as compared with state-of-the-art technologies.

Anthology ID:: 2004.amta-papers.29
Volume:: Proceedings of the 6th Conference of the Association for Machine Translation in the Americas: Technical Papers
Month:: September 28 - October 2
Year:: 2004
Address:: Washington, USA
Editors:: Robert E. Frederking, Kathryn B. Taylor
Venue:: AMTA
SIG:
Publisher:: Springer
Note:
Pages:: 262–271
Language:
URL:: https://link.springer.com/chapter/10.1007/978-3-540-30194-3_29
DOI:
Bibkey:
Cite (ACL):: Hua Wu and Haifeng Wang. 2004. Improving domain-specific word alignment with a general bilingual corpus. In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 262–271, Washington, USA. Springer.
Cite (Informal):: Improving domain-specific word alignment with a general bilingual corpus (Wu & Wang, AMTA 2004)
Copy Citation:
PDF:: https://link.springer.com/chapter/10.1007/978-3-540-30194-3_29

PDF Cite Search Fix data