Towards a general and extensible phrase-extraction algorithm

Wang Ling, Tiago Luís, João Graça, Luísa Coheur, Isabel Trancoso


Abstract
Phrase-based systems deeply depend on the quality of their phrase tables and therefore, the process of phrase extraction is always a fundamental step. In this paper we present a general and extensible phrase extraction algorithm, where we have highlighted several control points. The instantiation of these control points allows the simulation of previous approaches, as in each one of these points different strategies/heuristics can be tested. We show how previous approaches fit in this algorithm, compare several of them and, in addition, we propose alternative heuristics, showing their impact on the final translation results. Considering two different test scenarios from the IWSLT 2010 competition (BTEC, Fr-En and DIALOG, Cn-En), we have obtained an improvement in the results of 2.4 and 2.8 BLEU points, respectively.
Anthology ID:
2010.iwslt-papers.14
Volume:
Proceedings of the 7th International Workshop on Spoken Language Translation: Papers
Month:
December 2-3
Year:
2010
Address:
Paris, France
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
313–320
Language:
URL:
https://aclanthology.org/2010.iwslt-papers.14
DOI:
Bibkey:
Cite (ACL):
Wang Ling, Tiago Luís, João Graça, Luísa Coheur, and Isabel Trancoso. 2010. Towards a general and extensible phrase-extraction algorithm. In Proceedings of the 7th International Workshop on Spoken Language Translation: Papers, pages 313–320, Paris, France.
Cite (Informal):
Towards a general and extensible phrase-extraction algorithm (Ling et al., IWSLT 2010)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2010.iwslt-papers.14.pdf