Shallow-Syntax Phrase-Based Translation: Joint versus Factored String-to-Chunk Models
Mauro Cettolo, Marcello Federico, Daniele Pighin, Nicola Bertoldi
Abstract
This work extends phrase-based statistical MT (SMT) with shallow syntax dependencies. Two string-to-chunks translation models are proposed: a factored model, which augments phrase-based SMT with layered dependencies, and a joint model, that extends the phrase translation table with microtags, i.e. per-word projections of chunk labels. Both rely on n-gram models of target sequences with different granularity: single words, micro-tags, chunks. In particular, n-grams defined over syntactic chunks should model syntactic constraints coping with word-group movements. Experimental analysis and evaluation conducted on two popular Chinese-English tasks suggest that the shallow-syntax joint-translation model has potential to outperform state-of-the-art phrase-based translation, with a reasonable computational overhead.- Anthology ID:
- 2008.amta-papers.3
- Volume:
- Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers
- Month:
- October 21-25
- Year:
- 2008
- Address:
- Waikiki, USA
- Venue:
- AMTA
- SIG:
- Publisher:
- Association for Machine Translation in the Americas
- Note:
- Pages:
- 56–64
- Language:
- URL:
- https://aclanthology.org/2008.amta-papers.3
- DOI:
- Cite (ACL):
- Mauro Cettolo, Marcello Federico, Daniele Pighin, and Nicola Bertoldi. 2008. Shallow-Syntax Phrase-Based Translation: Joint versus Factored String-to-Chunk Models. In Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers, pages 56–64, Waikiki, USA. Association for Machine Translation in the Americas.
- Cite (Informal):
- Shallow-Syntax Phrase-Based Translation: Joint versus Factored String-to-Chunk Models (Cettolo et al., AMTA 2008)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2008.amta-papers.3.pdf