Marcus Sammer


2007

pdf
Lexical translation with application to image searching on the web
Oren Etzioni | Kobi Reiter | Stephen Soderland | Marcus Sammer
Proceedings of Machine Translation Summit XI: Papers

pdf
Building a sense-distinguished multilingual lexicon from monolingual corpora and bilingual lexicons
Marcus Sammer | Stephen Soderland
Proceedings of Machine Translation Summit XI: Papers

2006

pdf
Ambiguity Reduction for Machine Translation: Human-Computer Collaboration
Marcus Sammer | Kobi Reiter | Stephen Soderland | Katrin Kirchhoff | Oren Etzioni
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers

Statistical Machine Translation (SMT) accuracy degrades when there is only a limited amount of training, or when the training is not from the same domain or genre of text as the target application. However, cross-domain applications are typical of many real world tasks. We demonstrate that SMT accuracy can be improved in a cross-domain application by using a controlled language (CL) interface to help reduce lexical ambiguity in the input text. Our system, CL-MT, presents a monolingual user with a choice of word senses for each content word in the input text. CL-MT temporarily adjusts the underlying SMT system's phrase table, boosting the scores of translations that include the word senses preferred by the user and lowering scores for disfavored translations. We demonstrate that this improves translation adequacy in 33.8% of the sentences in Spanish to English translation of news stories, where the SMT system was trained on proceedings of the European Parliament.