Sokratis Sofianopoulos........

Also published as: Sokratis Sofianopoulos

2022

pdf abs
Constructing Parallel Corpora from COVID-19 News using MediSys Metadata
Dimitrios Roussis | Vassilis Papavassiliou | Sokratis Sofianopoulos | Prokopis Prokopidis | Stelios Piperidis
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper presents a collection of parallel corpora generated by exploiting the COVID-19 related dataset of metadata created with the Europe Media Monitor (EMM) / Medical Information System (MediSys) processing chain of news articles. We describe how we constructed comparable monolingual corpora of news articles related to the current pandemic and used them to mine about 11.2 million segment alignments in 26 EN-X language pairs, covering most official EU languages plus Albanian, Arabic, Icelandic, Macedonian, and Norwegian. Subsets of this collection have been used in shared tasks (e.g. Multilingual Semantic Search, Machine Translation) aimed at accelerating the creation of resources and tools needed to facilitate access to information in the COVID-19 emergency situation.

pdf abs
Welocalize-ARC/NKUA’s Submission to the WMT 2022 Quality Estimation Shared Task
Eirini Zafeiridou | Sokratis Sofianopoulos
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper presents our submission to the WMT 2022 quality estimation shared task and more specifically to the quality prediction sentence-level direct assessment (DA) subtask. We build a multilingual system based on the predictor–estimator architecture by using the XLM-RoBERTa transformer for feature extraction and a regression head on top of the final model to estimate the z-standardized DA labels. Furthermore, we use pretrained models to extract useful knowledge that reflect various criteria of quality assessment and demonstrate good correlation with human judgements. We optimize the performance of our model by incorporating this information as additional external features in the input data and by applying Monte Carlo dropout during both training and inference.

2018

pdf abs
The ILSP/ARC submission to the WMT 2018 Parallel Corpus Filtering Shared Task
Vassilis Papavassiliou | Sokratis Sofianopoulos | Prokopis Prokopidis | Stelios Piperidis
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the submission of the Institute for Language and Speech Processing/Athena Research and Innovation Center (ILSP/ARC) for the WMT 2018 Parallel Corpus Filtering shared task. We explore several properties of sentences and sentence pairs that our system explored in the context of the task with the purpose of clustering sentence pairs according to their appropriateness in training MT systems. We also discuss alternative methods for ranking the sentence pairs of the most appropriate clusters with the aim of generating the two datasets (of 10 and 100 million words as required in the task) that were evaluated. By summarizing the results of several experiments that were carried out by the organizers during the evaluation phase, our submission achieved an average BLEU score of 26.41, even though it does not make use of any language-specific resources like bilingual lexica, monolingual corpora, or MT output, while the average score of the best participant system was 27.91.

2008

In this paper we describe the METIS-II system and its evaluation on each of the language pairs: Dutch, German, Greek, and Spanish to English. The METIS-II system envisaged developing a data-driven approach in which no parallel corpus is required, and in which no full parser or extensive rule sets are needed. We describe evalution on a development test set and on a test set coming from Europarl, and compare our results with SYSTRAN. We also provide some further analysis, researching the impact of the number and source of the reference translations and analysing the results according to test text type. The results are expectably lower for the METIS system, but not at an unatainable distance from a mature system like SYSTRAN.

2007

pdf
Demonstration of the Greek to English METIS-II system
Sokratis Sofianopoulos | Vassiliki Spilioti | Marina Vassiliou | Olga Yannoutsou | Stella Markantonatou
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

2006

pdf
Using Patterns for Machine Translation
Stella Makantonatou | Sokratis Sofianopoulos | Vassiliki Spilioti | George Tambouratzis | Marina Vassiliou | Olga Yannoutsou
Proceedings of the 11th Annual Conference of the European Association for Machine Translation

2005

In the present article, a hybrid approach is proposed for implementing a machine translation system using a large monolingual corpus coupled with a bilingual lexicon and basic NLP tools. In the first phase of the METIS system, a source language (SL) sentence, after being tagged, lemmatised and translated by a flat lemma-to-lemma lexicon, was matched against a tagged and lemmatised target language (TL) corpus using a pattern matching algorithm. In the second phase, translations are generated by combining sub-sentential structures. In this paper, the main features of the second phase are discussed while the system architecture and the corresponding translation approach are presented. The proposed methodology is illustrated with examples of the translation process.

Venues

wmt2
mtsummit2
hytra2
lrec2
eamt2
show all...

tmi1

ssst1

ws1

coling1

acl1

ijcnlp1