Guillaume Wisinewski


2011

pdf
How good are your phrases? Assessing phrase quality with single class classification
Nadi Tomeh | Marco Turchi | Guillaume Wisinewski | Alexandre Allauzen | François Yvon
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers

We present a novel translation quality informed procedure for both extraction and scoring of phrase pairs in PBSMT systems. We reformulate the extraction problem in the supervised learning framework. Our goal is twofold. First, We attempt to take the translation quality into account; and second we incorporating arbitrary features in order to circumvent alignment errors. One-Class SVMs and the Mapping Convergence algorithm permit training a single-class classifier to discriminate between useful and useless phrase pairs. Such classifier can be learned from a training corpus that comprises only useful instances. The confidence score, produced by the classifier for each phrase pairs, is employed as a selection criteria. The smoothness of these scores allow a fine control over the size of the resulting translation model. Finally, confidence scores provide a new accuracy-based feature to score phrase pairs. Experimental evaluation of the method shows accurate assessments of phrase pairs quality even for regions in the space of possible phrase pairs that are ignored by other approaches. This enhanced evaluation of phrase pairs leads to improvements in the translation performance as measured by BLEU.