2021
pdf
abs
NoDeeLe: A Novel Deep Learning Schema for Evaluating Neural Machine Translation Systems
Despoina Mouratidis
|
Maria Stasimioti
|
Vilelmini Sosoni
|
Katia Lida Kermanidis
Proceedings of the Translation and Interpreting Technology Online Conference
Due to the wide-spread development of Machine Translation (MT) systems –especially Neural Machine Translation (NMT) systems– MT evaluation, both automatic and human, has become more and more important as it helps us establish how MT systems perform. Yet, automatic evaluation metrics have lagged behind, as the most popular choices (e.g., BLEU, METEOR and ROUGE) may correlate poorly with human judgments. This paper seeks to put to the test an evaluation model based on a novel deep learning schema (NoDeeLe) used to compare two NMT systems on four different text genres, i.e. medical, legal, marketing and literary in the English-Greek language pair. The model utilizes information from the source segments, the MT outputs and the reference translation, as well as the automatic metrics BLEU, METEOR and WER. The proposed schema achieves a strong correlation with human judgment (78% average accuracy for the four texts with the highest accuracy, i.e. 85%, observed in the case of the marketing text), while it outperforms classic machine learning algorithms and automatic metrics.
2020
pdf
abs
Machine Translation Quality: A comparative evaluation of SMT, NMT and tailored-NMT outputs
Maria Stasimioti
|
Vilelmini Sosoni
|
Katia Kermanidis
|
Despoina Mouratidis
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
The present study aims to compare three systems: a generic statistical machine translation (SMT), a generic neural machine translation (NMT) and a tailored-NMT system focusing on the English to Greek language pair. The comparison is carried out following a mixed-methods approach, i.e. automatic metrics, as well as side-by-side ranking, adequacy and fluency rating, measurement of actual post editing (PE) effort and human error analysis performed by 16 postgraduate Translation students. The findings reveal a higher score for both the generic NMT and the tailored-NMT outputs as regards automatic metrics and human evaluation metrics, with the tailored-NMT output faring even better than the generic NMT output.
pdf
abs
A Supervised Part-Of-Speech Tagger for the Greek Language of the Social Web
Maria Nefeli Nikiforos
|
Katia Lida Kermanidis
Proceedings of the Twelfth Language Resources and Evaluation Conference
The increasing volume of communication via microblogging messages on social networks has created the need for efficient Natural Language Processing (NLP) tools, especially for unstructured text processing. Extracting information from unstructured social text is one of the most demanding NLP tasks. This paper presents the first part-of-speech tagged data set of social text in Greek, as well as the first supervised part-of-speech tagger developed for such data sets.
2019
pdf
abs
Comparing a Hand-crafted to an Automatically Generated Feature Set for Deep Learning: Pairwise Translation Evaluation
Despoina Mouratidis
|
Katia Lida Kermanidis
Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019)
The automatic evaluation of machine translation (MT) has proven to be a very significant research topic. Most automatic evaluation methods focus on the evaluation of the output of MT as they compute similarity scores that represent translation quality. This work targets on the performance of MT evaluation. We present a general scheme for learning to classify parallel translations, using linguistic information, of two MT model outputs and one human (reference) translation. We present three experiments to this scheme using neural networks (NN). One using string based hand-crafted features (Exp1), the second using automatically trained embeddings from the reference and the two MT outputs (one from a statistical machine translation (SMT) model and the other from a neural ma-chine translation (NMT) model), which are learned using NN (Exp2), and the third experiment (Exp3) that combines information from the other two experiments. The languages involved are English (EN), Greek (GR) and Italian (IT) segments are educational in domain. The proposed language-independent learning scheme which combines information from the two experiments (experiment 3) achieves higher classification accuracy compared with models using BLEU score information as well as other classification approaches, such as Random Forest (RF) and Support Vector Machine (SVM).
2018
pdf
A Multilingual Wikified Data Set of Educational Material
Iris Hendrickx
|
Eirini Takoulidou
|
Thanasis Naskos
|
Katia Lida Kermanidis
|
Vilelmini Sosoni
|
Hugo de Vos
|
Maria Stasimioti
|
Menno van Zaanen
|
Panayota Georgakopoulou
|
Valia Kordoni
|
Maja Popovic
|
Markus Egg
|
Antal van den Bosch
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
pdf
Translation Crowdsourcing: Creating a Multilingual Corpus of Online Educational Content
Vilelmini Sosoni
|
Katia Lida Kermanidis
|
Maria Stasimioti
|
Thanasis Naskos
|
Eirini Takoulidou
|
Menno van Zaanen
|
Sheila Castilho
|
Panayota Georgakopoulou
|
Valia Kordoni
|
Markus Egg
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
pdf
Improving Machine Translation of Educational Content via Crowdsourcing
Maximiliana Behnke
|
Antonio Valerio Miceli Barone
|
Rico Sennrich
|
Vilelmini Sosoni
|
Thanasis Naskos
|
Eirini Takoulidou
|
Maria Stasimioti
|
Menno van Zaanen
|
Sheila Castilho
|
Federico Gaspari
|
Panayota Georgakopoulou
|
Valia Kordoni
|
Markus Egg
|
Katia Lida Kermanidis
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2016
pdf
abs
Enhancing Access to Online Education: Quality Machine Translation of MOOC Content
Valia Kordoni
|
Antal van den Bosch
|
Katia Lida Kermanidis
|
Vilelmini Sosoni
|
Kostadin Cholakov
|
Iris Hendrickx
|
Matthias Huck
|
Andy Way
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
The present work is an overview of the TraMOOC (Translation for Massive Open Online Courses) research and innovation project, a machine translation approach for online educational content. More specifically, videolectures, assignments, and MOOC forum text is automatically translated from English into eleven European and BRIC languages. Unlike previous approaches to machine translation, the output quality in TraMOOC relies on a multimodal evaluation schema that involves crowdsourcing, error type markup, an error taxonomy for translation model comparison, and implicit evaluation via text mining, i.e. entity recognition and its performance comparison between the source and the translated text, and sentiment analysis on the students’ forum posts. Finally, the evaluation output will result in more and better quality in-domain parallel data that will be fed back to the translation engine for higher quality output. The translation service will be incorporated into the Iversity MOOC platform and into the VideoLectures.net digital library portal.
pdf
TraMOOC (Translation for Massive Open Online Courses): providing reliable MT for MOOCs
Valia Kordoni
|
Lexi Birch
|
Ioana Buliga
|
Kostadin Cholakov
|
Markus Egg
|
Federico Gaspari
|
Yota Georgakopolou
|
Maria Gialama
|
Iris Hendrickx
|
Mitja Jermol
|
Katia Kermanidis
|
Joss Moorkens
|
Davor Orlic
|
Michael Papadopoulos
|
Maja Popović
|
Rico Sennrich
|
Vilelmini Sosoni
|
Dimitrios Tsoumakos
|
Antal van den Bosch
|
Menno van Zaanen
|
Andy Way
Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products
2015
pdf
TraMOOC: Translation for Massive Open Online Courses
Valia Kordoni
|
Kostadin Cholakov
|
Markus Egg
|
Andy Way
|
Lexi Birch
|
Katia Kermanidis
|
Vilelmini Sosoni
|
Dimitrios Tsoumakos
|
Antal van den Bosch
|
Iris Hendrickx
|
Michael Papadopoulos
|
Panayota Georgakopoulou
|
Maria Gialama
|
Menno van Zaanen
|
Ioana Buliga
|
Mitja Jermol
|
Davor Orlic
Proceedings of the 18th Annual Conference of the European Association for Machine Translation
pdf
TraMOOC: Translation for Massive Open Online Courses
Valia Kordoni
|
Kostadin Cholakov
|
Markus Egg
|
Andy Way
|
Lexi Birch
|
Katia Kermanidis
|
Vilelmini Sosoni
|
Dimitrios Tsoumakos
|
Antal van den Bosch
|
Iris Hendrickx
|
Michael Papadopoulos
|
Panayota Georgakopoulou
|
Maria Gialama
|
Menno van Zaanen
|
Ioana Buliga
|
Mitja Jermol
|
Davor Orlic
Proceedings of the 18th Annual Conference of the European Association for Machine Translation
2008
pdf
abs
Eksairesis: A Domain-Adaptable System for Ontology Building from Unstructured Text
Katia Lida Kermanidis
|
Aristomenis Thanopoulos
|
Manolis Maragoudakis
|
Nikos Fakotakis
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
This paper describes Eksairesis, a system for learning economic domain knowledge automatically from Modern Greek text. The knowledge is in the form of economic terms and the semantic relations that govern them. The entire process in based on the use of minimal language-dependent tools, no external linguistic resources, and merely free, unstructured text. The methodology is thereby easily portable to other domains and other languages. The text is pre-processed with basic morphological annotation, and semantic (named and other) entities are identified using supervised learning techniques. Statistical filtering, i.e. corpora comparison is used to extract domain terms and supervised learning is again employed to detect the semantic relations between pairs of terms. Advanced classification schemata, ensemble learning, and one-sided sampling, are experimented with in order to deal with the noise in the data, which is unavoidable due to the low pre-processing level and the lack of sophisticated resources. An average 68.5% f-score over all the classes is achieved when learning semantic relations. Bearing in mind the use of minimal resources and the highly automated nature of the process, classification performance is very promising, compared to results reported in previous work.
2006
pdf
abs
Dealing with Imbalanced Data using Bayesian Techniques
Manolis Maragoudakis
|
Katia Kermanidis
|
Aristogiannis Garbis
|
Nikos Fakotakis
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
For the present work, we deal with the significant problem of high imbalance in data in binary or multi-class classification problems. We study two different linguistic applications. The former determines whether a syntactic construction (environment) co-occurs with a verb in a natural text corpus consists a subcategorization frame of the verb or not. The latter is called Name Entity Recognition (NER) and it concerns determining whether a noun belongs to a specific Name Entity class. Regarding the subcategorization domain, each environment is encoded as a vector of heterogeneous attributes, where a very high imbalance between positive and negative examples is observed (an imbalance ratio of approximately 1:80). In the NER application, the imbalance between a name entity class and the negative class is even greater (1:120). In order to confront the plethora of negative instances, we suggest a search tactic during training phase that employs Tomek links for reducing unnecessary negative examples from the training set. Regarding the classification mechanism, we argue that Bayesian networks are well suited and we propose a novel network structure which efficiently handles heterogeneous attributes without discretization and is more classification-oriented. Comparing the experimental results with those of other known machine learning algorithms, our methodology performs significantly better in detecting examples of the rare class.
2004
pdf
Learning Greek Verb Complements: Addressing the Class Imbalance
Katia Kermanidis
|
Manolis Maragoudakis
|
Nikos Fakotakis
|
George Kokkinakis
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics
2002
pdf
Combining Bayesian and Support Vector Machines Learning to automatically complete Syntactical Information for HPSG-like Formalisms
Manolis Maragoudakis
|
Katia Kermanidis
|
Nikos Fakotakis
|
George Kokkinakis
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)
pdf
DELOS: An Automatically Tagged Economic Corpus for Modern Greek
Katia Lida Kermanidis
|
Nikos Fakotakis
|
George Kokkinakis
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)