Athanasios Katsamanis
2025
Krikri: Advancing Open Large Language Models for Greek
Dimitris Roussis | Leon Voukoutis | Georgios Paraskevopoulos | Sokratis Sofianopoulos | Prokopis Prokopidis | Vassilis Papavassileiou | Athanasios Katsamanis | Stelios Piperidis | Vassilis Katsouros
Findings of the Association for Computational Linguistics: EMNLP 2025
Dimitris Roussis | Leon Voukoutis | Georgios Paraskevopoulos | Sokratis Sofianopoulos | Prokopis Prokopidis | Vassilis Papavassileiou | Athanasios Katsamanis | Stelios Piperidis | Vassilis Katsouros
Findings of the Association for Computational Linguistics: EMNLP 2025
We introduce Llama-Krikri-8B, a cutting-edge Large Language Model tailored for the Greek language, built on Meta’s Llama 3.1-8B. Llama-Krikri-8B has been extensively trained on high-quality Greek data to ensure superior adaptation to linguistic nuances. With 8 billion parameters, it offers advanced capabilities while maintaining efficient computational performance. Llama-Krikri-8B supports both Modern Greek and English, and is also equipped to handle polytonic text and Ancient Greek. The chat version of Llama-Krikri-8B features a multi-stage post-training pipeline, utilizing both human and synthetic instruction and preference data, by applying techniques such as MAGPIE. In addition, for evaluation, we propose three novel public benchmarks for Greek. Our evaluation on existing as well as the proposed benchmarks shows notable improvements over comparable Greek and multilingual LLMs in both natural language understanding and generation as well as code generation.
2023
ASR pipeline for low-resourced languages: A case study on Pomak
Chara Tsoukala | Kosmas Kritsis | Ioannis Douros | Athanasios Katsamanis | Nikolaos Kokkas | Vasileios Arampatzakis | Vasileios Sevetlidis | Stella Markantonatou | George Pavlidis
Proceedings of the Second Workshop on NLP Applications to Field Linguistics
Chara Tsoukala | Kosmas Kritsis | Ioannis Douros | Athanasios Katsamanis | Nikolaos Kokkas | Vasileios Arampatzakis | Vasileios Sevetlidis | Stella Markantonatou | George Pavlidis
Proceedings of the Second Workshop on NLP Applications to Field Linguistics
Automatic Speech Recognition (ASR) models can aid field linguists by facilitating the creation of text corpora from oral material. Training ASR systems for low-resource languages can be a challenging task not only due to lack of resources but also due to the work required for the preparation of a training dataset. We present a pipeline for data processing and ASR model training for low-resourced languages, based on the language family. As a case study, we collected recordings of Pomak, an endangered South East Slavic language variety spoken in Greece. Using the proposed pipeline, we trained the first Pomak ASR model.
Search
Fix author
Co-authors
- Priti Aggarwal 1
- Vasileios Arampatzakis 1
- Ron Artstein 1
- Ioannis Douros 1
- Jillian Gerten 1
- Vassilis Katsouros 1
- Nikolaos Kokkas 1
- Kosmas Kritsis 1
- Stella Markantonatou 1
- Shrikanth Narayanan 1
- Angela Nazarian 1
- Vassilis Papavassileiou 1
- Georgios Paraskevopoulos 1
- George Pavlidis 1
- Stelios Piperidis 1
- Prokopis Prokopidis 1
- Dimitris Roussis 1
- Vasileios Sevetlidis 1
- Sokratis Sofianopoulos 1
- David Traum 1
- Chara Tsoukala 1
- Leon Voukoutis 1