Abstract
The frequent occurrence of divergenceS—structural differences between languages—presents a great challenge for statistical word-level alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.- Anthology ID:
- 2002.amta-papers.4
- Volume:
- Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers
- Month:
- October 8-12
- Year:
- 2002
- Address:
- Tiburon, USA
- Editor:
- Stephen D. Richardson
- Venue:
- AMTA
- SIG:
- Publisher:
- Springer
- Note:
- Pages:
- 31–43
- Language:
- URL:
- https://link.springer.com/chapter/10.1007/3-540-45820-4_4
- DOI:
- Cite (ACL):
- Bonnie Dorr, Lisa Pearl, Rebecca Hwa, and Nizar Habash. 2002. DUSTer: a method for unraveling cross-language divergences for statistical word-level alignment. In Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 31–43, Tiburon, USA. Springer.
- Cite (Informal):
- DUSTer: a method for unraveling cross-language divergences for statistical word-level alignment (Dorr et al., AMTA 2002)
- PDF:
- https://link.springer.com/chapter/10.1007/3-540-45820-4_4