Towards a general and extensible phrase-extraction algorithm
Wang Ling, Tiago Luís, João Graça, Luísa Coheur, Isabel Trancoso
Abstract
Phrase-based systems deeply depend on the quality of their phrase tables and therefore, the process of phrase extraction is always a fundamental step. In this paper we present a general and extensible phrase extraction algorithm, where we have highlighted several control points. The instantiation of these control points allows the simulation of previous approaches, as in each one of these points different strategies/heuristics can be tested. We show how previous approaches fit in this algorithm, compare several of them and, in addition, we propose alternative heuristics, showing their impact on the final translation results. Considering two different test scenarios from the IWSLT 2010 competition (BTEC, Fr-En and DIALOG, Cn-En), we have obtained an improvement in the results of 2.4 and 2.8 BLEU points, respectively.- Anthology ID:
- 2010.iwslt-papers.14
- Volume:
- Proceedings of the 7th International Workshop on Spoken Language Translation: Papers
- Month:
- December 2-3
- Year:
- 2010
- Address:
- Paris, France
- Venue:
- IWSLT
- SIG:
- SIGSLT
- Publisher:
- Note:
- Pages:
- 313–320
- Language:
- URL:
- https://aclanthology.org/2010.iwslt-papers.14
- DOI:
- Cite (ACL):
- Wang Ling, Tiago Luís, João Graça, Luísa Coheur, and Isabel Trancoso. 2010. Towards a general and extensible phrase-extraction algorithm. In Proceedings of the 7th International Workshop on Spoken Language Translation: Papers, pages 313–320, Paris, France.
- Cite (Informal):
- Towards a general and extensible phrase-extraction algorithm (Ling et al., IWSLT 2010)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2010.iwslt-papers.14.pdf