Abstract
We consider the value of replacing and/or combining string-basedmethods with syntax-based methods for phrase-based statistical machine translation (PBSMT), and we also consider the relative merits of using constituency-annotated vs. dependency-annotated training data. We automatically derive two subtree-aligned treebanks, dependency-based and constituency-based, from a parallel English–French corpus and extract syntactically motivated word- and phrase-pairs. We automatically measure PB-SMT quality. The results show that combining string-based and syntax-based word- and phrase-pairs can improve translation quality irrespective of the type of syntactic annotation. Furthermore, using dependency annotation yields greater translation quality than constituency annotation for PB-SMT.- Anthology ID:
- 2008.jeptalnrecital-court.14
- Volume:
- Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts
- Month:
- June
- Year:
- 2008
- Address:
- Avignon, France
- Editors:
- Frédéric Béchet, Jean-Francois Bonastre
- Venue:
- JEP/TALN/RECITAL
- SIG:
- Publisher:
- ATALA
- Note:
- Pages:
- 131–140
- Language:
- URL:
- https://aclanthology.org/2008.jeptalnrecital-court.14
- DOI:
- Cite (ACL):
- Mary Hearne, Sylwia Ozdowska, and John Tinsley. 2008. Comparing Constituency and Dependency Representations for SMT Phrase-Extraction. In Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, pages 131–140, Avignon, France. ATALA.
- Cite (Informal):
- Comparing Constituency and Dependency Representations for SMT Phrase-Extraction (Hearne et al., JEP/TALN/RECITAL 2008)
- PDF:
- https://preview.aclanthology.org/jeptaln-2024-ingestion/2008.jeptalnrecital-court.14.pdf