Abstract
We improve our recently proposed technique for integrating Arabic verb-subject constructions in SMT word alignment (Carpuat et al., 2010) by distinguishing between matrix (or main clause) and non-matrix Arabic verb-subject constructions. In gold translations, most matrix VS (main clause verb-subject) constructions are translated in inverted SV order, while non-matrix (subordinate clause) VS constructions are inverted in only half the cases. In addition, while detecting verbs and their subjects is a hard task, our syntactic parser detects VS constructions better in matrix than in non-matrix clauses. As a result, reordering only matrix VS for word alignment consistently improves translation quality over a phrase-based SMT baseline, and over reordering all VS constructions, in both medium- and large-scale settings. In fact, the improvements obtained by reordering matrix VS on the medium-scale setting remarkably represent 44% of the gain in BLEU and 51% of the gain in TER obtained with a word alignment training bitext that is 5 times larger.- Anthology ID:
- 2010.jeptalnrecital-long.30
- Volume:
- Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs
- Month:
- July
- Year:
- 2010
- Address:
- Montréal, Canada
- Editors:
- Philippe Langlais, Michel Gagnon
- Venue:
- JEP/TALN/RECITAL
- SIG:
- Publisher:
- ATALA
- Note:
- Pages:
- 292–301
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/2010.jeptalnrecital-long.30/
- DOI:
- Cite (ACL):
- Marine Carpuat, Yuval Marton, and Nizar Habash. 2010. Reordering Matrix Post-verbal Subjects for Arabic-to-English SMT. In Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, pages 292–301, Montréal, Canada. ATALA.
- Cite (Informal):
- Reordering Matrix Post-verbal Subjects for Arabic-to-English SMT (Carpuat et al., JEP/TALN/RECITAL 2010)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/2010.jeptalnrecital-long.30.pdf