2016
pdf
TÜBİTAK SMT System Submission for WMT2016
Emre Bektaş
|
Ertuğrul Yilmaz
|
Coşkun Mermer
|
İlknur Durgar El-Kahlout
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
2013
pdf
TÜBİTAK-BİLGEM German-English Machine Translation Systems for W13
İlknur Durgar El-Kahlout
|
Coşkun Mermer
Proceedings of the Eighth Workshop on Statistical Machine Translation
pdf
abs
TÜBİTAK Turkish-English submissions for IWSLT 2013
Ertuğrul Yılmaz
|
İlknur Durgar El-Kahlout
|
Burak Aydın
|
Zişan Sıla Özil
|
Coşkun Mermer
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper describes the TU ̈ B ̇ITAK Turkish-English submissions in both directions for the IWSLT’13 Evaluation Campaign TED Machine Translation (MT) track. We develop both phrase-based and hierarchical phrase-based statistical machine translation (SMT) systems based on Turkish wordand morpheme-level representations. We augment training data with content words extracted from itself and experiment with reverse word order for source languages. For the Turkish-to-English direction, we use Gigaword corpus as an additional language model with the training data. For the English-to-Turkish direction, we implemented a wide coverage Turkish word generator to generate words from the stem and morpheme sequences. Finally, we perform system combination of the different systems produced with different word alignments.
2012
pdf
abs
The TÜBİTAK statistical machine translation system for IWSLT 2012
Coşkun Mermer
|
Hamza Kaya
|
İlknur Durgar El-Kahlout
|
Mehmet Uğur Doğan
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign
WedescribetheTU ̈B ̇ITAKsubmissiontotheIWSLT2012 Evaluation Campaign. Our system development focused on utilizing Bayesian alignment methods such as variational Bayes and Gibbs sampling in addition to the standard GIZA++ alignments. The submitted tracks are the Arabic-English and Turkish-English TED Talks translation tasks.
2011
pdf
Bayesian Word Alignment for Statistical Machine Translation
Coşkun Mermer
|
Murat Saraçlar
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
2010
pdf
Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
Coşkun Mermer
Proceedings of the ACL 2010 Student Research Workshop
pdf
abs
The TÜBİTAK-UEKAE statistical machine translation system for IWSLT 2010
Coskun Mermer
|
Hamza Kaya
|
Mehmet Uğur Doğan
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign
We report on our participation in the IWSLT 2010 evaluation campaign. Similar to previous years, our submitted systems are based on the Moses statistical machine translation toolkit. This year, we also experimented with hierarchical phrase-based models. In addition, we utilized automatic minimum error-rate training instead of manually-guided tuning. We focused more on the BTEC Turkish-English task and explored various experimentations with unsupervised segmentation to measure their effects on the translation performance. We present the results of several contrastive experiments, including those that failed to improve the translation performance.
2009
pdf
abs
The TÜBİTAK-UEKAE statistical machine translation system for IWSLT 2009
Coşkun Mermer
|
Hamza Kaya
|
Mehmet Uğur Doğan
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign
We describe our Arabic-to-English and Turkish-to-English machine translation systems that participated in the IWSLT 2009 evaluation campaign. Both systems are based on the Moses statistical machine translation toolkit, with added components to address the rich morphology of the source languages. Three different morphological approaches are investigated for Turkish. Our primary submission uses linguistic morphological analysis and statistical disambiguation to generate morpheme-based translation models, which is the approach with the better translation performance. One of the contrastive submissions utilizes unsupervised subword segmentation to generate non-linguistic subword-based translation models, while another contrastive system uses word-based models but makes use of lexical approximation to cope with out-of-vocabulary words, similar to the approach in our Arabic-to-English submission.
2008
pdf
abs
The TÜBÍTAK-UEKAE statistical machine translation system for IWSLT 2008.
Coşkun Mermer
|
Hamza Kaya
|
Ömer Farukhan Güneş
|
Mehmet Uğur Doğan
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign
We present the TÜBİTAK-UEKAE statistical machine translation system that participated in the IWSLT 2008 evaluation campaign. Our system is based on the open-source phrase-based statistical machine translation software Moses. Additionally, phrase-table augmentation is applied to maximize source language coverage; lexical approximation is applied to replace out-of-vocabulary words with known words prior to decoding; and automatic punctuation insertion is improved. We describe the preprocessing and postprocessing steps and our training and decoding procedures. Results are presented on our participation in the classical Arabic-English and Chinese-English tasks as well as the new Chinese-Spanish direct and Chinese-English-Spanish pivot translation tasks.
2007
pdf
abs
The TÜBÍTAK-UEKAE statistical machine translation system for IWSLT 2007
Coşkun Mermer
|
Hamza Kaya
|
Mehmet Uğur Doğan
Proceedings of the Fourth International Workshop on Spoken Language Translation
We describe the TÜBITAK-UEKAE system that participated in the Arabic-to-English and Japanese-to-English translation tasks of the IWSLT 2007 evaluation campaign. Our system is built on the open-source phrase-based statistical machine translation software Moses. Among available corpora and linguistic resources, only the supplied training data and an Arabic morphological analyzer are used in the system. We present the run-time lexical approximation method to cope with out-of-vocabulary words during decoding. We tested our system under both automatic speech recognition (ASR) and clean transcript (clean) input conditions. Our system was ranked first in both Arabic-to-English and Japanese-to-English tasks under the “clean” condition.