Using Log-linear Models for Tuning Machine Translation Output

Michael Carl


Abstract
We describe a set of experiments to explore statistical techniques for ranking and selecting the best translations in a graph of translation hypotheses. In a previous paper (Carl, 2007) we have described how the graph of hypotheses is generated through shallow transfer and chunk permutation rules, where nodes consist of vectors representing morpho-syntactic properties of words and phrases. This paper describes a number of methods to train statistical feature functions from some of the vector’s components. The feature functions are trained off-line on different types of text and their log-linear combination is then used to retrieve the best translation paths in the graph. We compare two language modelling toolkits, the CMU and the SRI toolkit and arrive at three results: 1) models of lemma-based feature functions produce better results than token-based models, 2) adding PoS-tag feature function to the lemma models improves the output and 3) weights for lexical translations are suited if the training material is similar to the texts to be translated.
Anthology ID:
L08-1110
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/292_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Michael Carl. 2008. Using Log-linear Models for Tuning Machine Translation Output. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Using Log-linear Models for Tuning Machine Translation Output (Carl, LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/292_paper.pdf