Kostadin Cholakov


2016

pdf
TraMOOC (Translation for Massive Open Online Courses): providing reliable MT for MOOCs
Valia Kordoni | Lexi Birch | Ioana Buliga | Kostadin Cholakov | Markus Egg | Federico Gaspari | Yota Georgakopolou | Maria Gialama | Iris Hendrickx | Mitja Jermol | Katia Kermanidis | Joss Moorkens | Davor Orlic | Michael Papadopoulos | Maja Popović | Rico Sennrich | Vilelmini Sosoni | Dimitrios Tsoumakos | Antal van den Bosch | Menno van Zaanen | Andy Way
Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products

pdf bib
Proceedings of the 12th Workshop on Multiword Expressions
Valia Kordoni | Kostadin Cholakov | Markus Egg | Stella Markantonatou | Preslav Nakov
Proceedings of the 12th Workshop on Multiword Expressions

pdf
Using Word Embeddings for Improving Statistical Machine Translation of Phrasal Verbs
Kostadin Cholakov | Valia Kordoni
Proceedings of the 12th Workshop on Multiword Expressions

pdf
Enlarging Scarce In-domain English-Croatian Corpus for SMT of MOOCs Using Serbian
Maja Popović | Kostadin Cholakov | Valia Kordoni | Nikola Ljubešić
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)

Massive Open Online Courses have been growing rapidly in size and impact. Yet the language barrier constitutes a major growth impediment in reaching out all people and educating all citizens. A vast majority of educational material is available only in English, and state-of-the-art machine translation systems still have not been tailored for this peculiar genre. In addition, a mere collection of appropriate in-domain training material is a challenging task. In this work, we investigate statistical machine translation of lecture subtitles from English into Croatian, which is morphologically rich and generally weakly supported, especially for the educational domain. We show that results comparable with publicly available systems trained on much larger data can be achieved if a small in-domain training set is used in combination with additional in-domain corpus originating from the closely related Serbian language.

pdf
Enhancing Access to Online Education: Quality Machine Translation of MOOC Content
Valia Kordoni | Antal van den Bosch | Katia Lida Kermanidis | Vilelmini Sosoni | Kostadin Cholakov | Iris Hendrickx | Matthias Huck | Andy Way
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The present work is an overview of the TraMOOC (Translation for Massive Open Online Courses) research and innovation project, a machine translation approach for online educational content. More specifically, videolectures, assignments, and MOOC forum text is automatically translated from English into eleven European and BRIC languages. Unlike previous approaches to machine translation, the output quality in TraMOOC relies on a multimodal evaluation schema that involves crowdsourcing, error type markup, an error taxonomy for translation model comparison, and implicit evaluation via text mining, i.e. entity recognition and its performance comparison between the source and the translated text, and sentiment analysis on the students’ forum posts. Finally, the evaluation output will result in more and better quality in-domain parallel data that will be fed back to the translation engine for higher quality output. The translation service will be incorporated into the Iversity MOOC platform and into the VideoLectures.net digital library portal.

2015

pdf
TraMOOC: Translation for Massive Open Online Courses
Valia Kordoni | Kostadin Cholakov | Markus Egg | Andy Way | Lexi Birch | Katia Kermanidis | Vilelmini Sosoni | Dimitrios Tsoumakos | Antal van den Bosch | Iris Hendrickx | Michael Papadopoulos | Panayota Georgakopoulou | Maria Gialama | Menno van Zaanen | Ioana Buliga | Mitja Jermol | Davor Orlic
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf
TraMOOC: Translation for Massive Open Online Courses
Valia Kordoni | Kostadin Cholakov | Markus Egg | Andy Way | Lexi Birch | Katia Kermanidis | Vilelmini Sosoni | Dimitrios Tsoumakos | Antal van den Bosch | Iris Hendrickx | Michael Papadopoulos | Panayota Georgakopoulou | Maria Gialama | Menno van Zaanen | Ioana Buliga | Mitja Jermol | Davor Orlic
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf
Lexical Substitution Dataset for German
Kostadin Cholakov | Chris Biemann | Judith Eckle-Kohler | Iryna Gurevych
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This article describes a lexical substitution dataset for German. The whole dataset contains 2,040 sentences from the German Wikipedia, with one target word in each sentence. There are 51 target nouns, 51 adjectives, and 51 verbs randomly selected from 3 frequency groups based on the lemma frequency list of the German WaCKy corpus. 200 sentences have been annotated by 4 professional annotators and the remaining sentences by 1 professional annotator and 5 additional annotators who have been recruited via crowdsourcing. The resulting dataset can be used to evaluate not only lexical substitution systems, but also different sense inventories and word sense disambiguation systems.

pdf
Better Statistical Machine Translation through Linguistic Treatment of Phrasal Verbs
Kostadin Cholakov | Valia Kordoni
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf
Automated Verb Sense Labelling Based on Linked Lexical Resources
Kostadin Cholakov | Judith Eckle-Kohler | Iryna Gurevych
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf
An Empirical Comparison of Unknown Word Prediction Methods
Kostadin Cholakov | Gertjan van Noord | Valia Kordoni | Yi Zhang
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf
Adaptability of Lexical Acquisition for Large-scale Grammars
Kostadin Cholakov | Gertjan van Noord | Valia Kordoni | Yi Zhang
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf
Using Unknown Word Techniques to Learn Known Words
Kostadin Cholakov | Gertjan van Noord
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf
Acquisition of Unknown Word Paradigms for Large-Scale Grammars
Kostadin Cholakov | Gertjan van Noord
Coling 2010: Posters

2009

pdf
Combining Finite State and Corpus-based Techniques for Unknown Word Prediction
Kostadin Cholakov | Gertjan van Noord
Proceedings of the International Conference RANLP-2009

2008

pdf
Towards Domain-Independent Deep Linguistic Processing: Ensuring Portability and Re-Usability of Lexicalised Grammars
Kostadin Cholakov | Valia Kordoni | Yi Zhang
Coling 2008: Proceedings of the workshop on Grammar Engineering Across Frameworks