Sokratis Sofianopoulos
Also published as: Sokratis Sofianopoulos........
2025
Krikri: Advancing Open Large Language Models for Greek
Dimitris Roussis | Leon Voukoutis | Georgios Paraskevopoulos | Sokratis Sofianopoulos | Prokopis Prokopidis | Vassilis Papavassileiou | Athanasios Katsamanis | Stelios Piperidis | Vassilis Katsouros
Findings of the Association for Computational Linguistics: EMNLP 2025
Dimitris Roussis | Leon Voukoutis | Georgios Paraskevopoulos | Sokratis Sofianopoulos | Prokopis Prokopidis | Vassilis Papavassileiou | Athanasios Katsamanis | Stelios Piperidis | Vassilis Katsouros
Findings of the Association for Computational Linguistics: EMNLP 2025
We introduce Llama-Krikri-8B, a cutting-edge Large Language Model tailored for the Greek language, built on Meta’s Llama 3.1-8B. Llama-Krikri-8B has been extensively trained on high-quality Greek data to ensure superior adaptation to linguistic nuances. With 8 billion parameters, it offers advanced capabilities while maintaining efficient computational performance. Llama-Krikri-8B supports both Modern Greek and English, and is also equipped to handle polytonic text and Ancient Greek. The chat version of Llama-Krikri-8B features a multi-stage post-training pipeline, utilizing both human and synthetic instruction and preference data, by applying techniques such as MAGPIE. In addition, for evaluation, we propose three novel public benchmarks for Greek. Our evaluation on existing as well as the proposed benchmarks shows notable improvements over comparable Greek and multilingual LLMs in both natural language understanding and generation as well as code generation.
2024
Enhancing Scientific Discourse: Machine Translation for the Scientific Domain
Dimitris Roussis | Sokratis Sofianopoulos | Stelios Piperidis
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)
Dimitris Roussis | Sokratis Sofianopoulos | Stelios Piperidis
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)
The increasing volume of scientific research necessitates effective communication across language barriers. Machine translation (MT) offers a promising solution for accessing international publications. However, the scientific domain presents unique challenges due to its specialized vocabulary and complex sentence structures. In this paper, we present the development of a collection of parallel and monolingual corpora from the scientific domain. The corpora target the language pairs Spanish-English, French-English, and Portuguese-English. For each language pair, we create a large general scientific corpus as well as four smaller corpora focused on the research domains of: Energy Research, Neuroscience, Cancer and Transportation. To evaluate the quality of these corpora, we utilize them for fine-tuning general-purpose neural machine translation (NMT) systems. We provide details regarding the corpus creation process, the fine-tuning strategies employed, and we conclude with the evaluation results.
2022
Constructing Parallel Corpora from COVID-19 News using MediSys Metadata
Dimitrios Roussis | Vassilis Papavassiliou | Sokratis Sofianopoulos | Prokopis Prokopidis | Stelios Piperidis
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Dimitrios Roussis | Vassilis Papavassiliou | Sokratis Sofianopoulos | Prokopis Prokopidis | Stelios Piperidis
Proceedings of the Thirteenth Language Resources and Evaluation Conference
This paper presents a collection of parallel corpora generated by exploiting the COVID-19 related dataset of metadata created with the Europe Media Monitor (EMM) / Medical Information System (MediSys) processing chain of news articles. We describe how we constructed comparable monolingual corpora of news articles related to the current pandemic and used them to mine about 11.2 million segment alignments in 26 EN-X language pairs, covering most official EU languages plus Albanian, Arabic, Icelandic, Macedonian, and Norwegian. Subsets of this collection have been used in shared tasks (e.g. Multilingual Semantic Search, Machine Translation) aimed at accelerating the creation of resources and tools needed to facilitate access to information in the COVID-19 emergency situation.
Welocalize-ARC/NKUA’s Submission to the WMT 2022 Quality Estimation Shared Task
Eirini Zafeiridou | Sokratis Sofianopoulos
Proceedings of the Seventh Conference on Machine Translation (WMT)
Eirini Zafeiridou | Sokratis Sofianopoulos
Proceedings of the Seventh Conference on Machine Translation (WMT)
This paper presents our submission to the WMT 2022 quality estimation shared task and more specifically to the quality prediction sentence-level direct assessment (DA) subtask. We build a multilingual system based on the predictor–estimator architecture by using the XLM-RoBERTa transformer for feature extraction and a regression head on top of the final model to estimate the z-standardized DA labels. Furthermore, we use pretrained models to extract useful knowledge that reflect various criteria of quality assessment and demonstrate good correlation with human judgements. We optimize the performance of our model by incorporating this information as additional external features in the input data and by applying Monte Carlo dropout during both training and inference.
2013
A Review of the PRESEMT project
George Tambouratzis | Marina Vassiliou | Sokratis Sofianopoulos........
Proceedings of Machine Translation Summit XIV: European projects
George Tambouratzis | Marina Vassiliou | Sokratis Sofianopoulos........
Proceedings of Machine Translation Summit XIV: European projects
2011
A resource-light phrase scheme for language-portable MT
George Tambouratzis | Fotini Simistira | Sokratis Sofianopoulos | Nikos Tsimboukakis | Marina Vassiliou
Proceedings of the 15th Annual Conference of the European Association for Machine Translation
George Tambouratzis | Fotini Simistira | Sokratis Sofianopoulos | Nikos Tsimboukakis | Marina Vassiliou
Proceedings of the 15th Annual Conference of the European Association for Machine Translation
2007
Demonstration of the Greek to English METIS-II system
Sokratis Sofianopoulos | Vassiliki Spilioti | Marina Vassiliou | Olga Yannoutsou | Stella Markantonatou
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers
Sokratis Sofianopoulos | Vassiliki Spilioti | Marina Vassiliou | Olga Yannoutsou | Stella Markantonatou
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers
2006
Using Patterns for Machine Translation
Stella Makantonatou | Sokratis Sofianopoulos | Vassiliki Spilioti | George Tambouratzis | Marina Vassiliou | Olga Yannoutsou
Proceedings of the 11th Annual Conference of the European Association for Machine Translation
Stella Makantonatou | Sokratis Sofianopoulos | Vassiliki Spilioti | George Tambouratzis | Marina Vassiliou | Olga Yannoutsou
Proceedings of the 11th Annual Conference of the European Association for Machine Translation
2005
Monolingual Corpus-based MT Using Chunks
Stella Markantonatou | Sokratis Sofianopoulos | Vassiliki Spilioti | Yiorgos Tambouratzis | Marina Vassiliou | Olga Yannoutsou | Nikos Ioannou
Workshop on example-based machine translation
Stella Markantonatou | Sokratis Sofianopoulos | Vassiliki Spilioti | Yiorgos Tambouratzis | Marina Vassiliou | Olga Yannoutsou | Nikos Ioannou
Workshop on example-based machine translation
In the present article, a hybrid approach is proposed for implementing a machine translation system using a large monolingual corpus coupled with a bilingual lexicon and basic NLP tools. In the first phase of the METIS system, a source language (SL) sentence, after being tagged, lemmatised and translated by a flat lemma-to-lemma lexicon, was matched against a tagged and lemmatised target language (TL) corpus using a pattern matching algorithm. In the second phase, translations are generated by combining sub-sentential structures. In this paper, the main features of the second phase are discussed while the system architecture and the corresponding translation approach are presented. The proposed methodology is illustrated with examples of the translation process.
Search
Fix author
Co-authors
- Marina Vassiliou 11
- George Tambouratzis 8
- Stelios Piperidis 5
- Olga Yannoutsou 4
- Stella Markantonatou 3
- Prokopis Prokopidis 3
- Vassiliki Spilioti 3
- Vassilis Papavassiliou 2
- Dimitris Roussis 2
- Toni Badia 1
- Juli Bakagianni 1
- Gemma Boleda 1
- Michael Carl 1
- Peter Dirix 1
- Dimitrios Galanis 1
- Nikos Ioannou 1
- Athanasios Katsamanis 1
- Vassilis Katsouros 1
- Stella Makantonatou 1
- Maite Melero 1
- Vassilis Papavassileiou 1
- Georgios Paraskevopoulos 1
- Dimitrios Roussis 1
- Paul Schmidt 1
- Ineke Schuurman 1
- Fotini Simistira 1
- Yiorgos Tambouratzis 1
- Nikos Tsimboukakis 1
- Vincent Vandeghinste 1
- Leon Voukoutis 1
- Eirini Zafeiridou 1