Abstract
This paper describes an approach to analyzing the lexical structure of OCRed bilingual dictionaries to construct resources suited for machine translation of low-density languages, where online resources are limited. A rule-based, an HMM-based, and a post-processed HMM-based method are used for rapid construction of MT lexicons based on systematic structural clues provided in the original dictionary. We evaluate the effectiveness of our techniques, concluding that: (1) the rule-based method performs better with dictionaries where the font is not an important distinguishing feature for determining information types; (2) the post-processed stochastic method improves the results of the stochastic method for phrasal entries; and (3) Our resulting bilingual lexicons are comprehensive enough to provide the basis for reasonable translation results when compared to human translations.- Anthology ID:
- 2003.mtsummit-papers.28
- Volume:
- Proceedings of Machine Translation Summit IX: Papers
- Month:
- September 23-27
- Year:
- 2003
- Address:
- New Orleans, USA
- Venue:
- MTSummit
- SIG:
- Publisher:
- Note:
- Pages:
- Language:
- URL:
- https://aclanthology.org/2003.mtsummit-papers.28
- DOI:
- Cite (ACL):
- Burcu Karagol-Ayan, David Doermann, and Bonnie J. Dorr. 2003. Acquisition of bilingual MT lexicons from OCRed dictionaries. In Proceedings of Machine Translation Summit IX: Papers, New Orleans, USA.
- Cite (Informal):
- Acquisition of bilingual MT lexicons from OCRed dictionaries (Karagol-Ayan et al., MTSummit 2003)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2003.mtsummit-papers.28.pdf