Reordering Matrix Post-verbal Subjects for Arabic-to-English SMT

Marine Carpuat, Yuval Marton, Nizar Habash


Abstract
We improve our recently proposed technique for integrating Arabic verb-subject constructions in SMT word alignment (Carpuat et al., 2010) by distinguishing between matrix (or main clause) and non-matrix Arabic verb-subject constructions. In gold translations, most matrix VS (main clause verb-subject) constructions are translated in inverted SV order, while non-matrix (subordinate clause) VS constructions are inverted in only half the cases. In addition, while detecting verbs and their subjects is a hard task, our syntactic parser detects VS constructions better in matrix than in non-matrix clauses. As a result, reordering only matrix VS for word alignment consistently improves translation quality over a phrase-based SMT baseline, and over reordering all VS constructions, in both medium- and large-scale settings. In fact, the improvements obtained by reordering matrix VS on the medium-scale setting remarkably represent 44% of the gain in BLEU and 51% of the gain in TER obtained with a word alignment training bitext that is 5 times larger.
Anthology ID:
2010.jeptalnrecital-long.30
Volume:
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs
Month:
July
Year:
2010
Address:
Montréal, Canada
Editors:
Philippe Langlais, Michel Gagnon
Venue:
JEP/TALN/RECITAL
SIG:
Publisher:
ATALA
Note:
Pages:
292–301
Language:
URL:
https://aclanthology.org/2010.jeptalnrecital-long.30
DOI:
Bibkey:
Cite (ACL):
Marine Carpuat, Yuval Marton, and Nizar Habash. 2010. Reordering Matrix Post-verbal Subjects for Arabic-to-English SMT. In Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, pages 292–301, Montréal, Canada. ATALA.
Cite (Informal):
Reordering Matrix Post-verbal Subjects for Arabic-to-English SMT (Carpuat et al., JEP/TALN/RECITAL 2010)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2010.jeptalnrecital-long.30.pdf