Voula Giouli
Also published as: V. Giouli
2026
Universal NER v2: Towards a Massively Multilingual Named Entity Recognition Benchmark
Terra Blevins | Stephen Mayhew | Marek Suppa | Hila Gonen | Shachar Mirkin | Vasile Pais | Kaja Dobrovoljc Zor | Voula Giouli | Jun Kevin | Eugene Jang | Eungseo Kim | Jeongyeon Seo | Xenophon Gialis | Yuval Pinter
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Terra Blevins | Stephen Mayhew | Marek Suppa | Hila Gonen | Shachar Mirkin | Vasile Pais | Kaja Dobrovoljc Zor | Voula Giouli | Jun Kevin | Eugene Jang | Eungseo Kim | Jeongyeon Seo | Xenophon Gialis | Yuval Pinter
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We present Universal NER (UNER) v2, a significant extension of the initial version released in 2024. UNER is a collaborative dataset for multilingual named-entity annotations, built to support research on NER methods in a cross-linguistic setting. UNER v2 adds 11 new datasets in 10 typologically varied languages to the resource, including multiple parallel evaluation benchmarks aligned with each other and other datasets in UNER v1, while maintaining the same annotation guidelines and high standards for inter-annotator agreement. We report detailed statistics for the dataset and benchmark UNER v2 using both encoder-based model architectures and LLMs.
A Parallel Cross-Lingual Benchmark for Multimodal Idiomaticity Understanding
Dilara Torunoğlu-Selamet | Doğukan Arslan | Rodrigo Wilkens | Wei He | Doruk Eryiğit | Thomas Pickard | Adriana S. Pagano | Aline Villavicencio | Gülşen Eryiğit | Ágnes Abuczki | Aida Cardoso | Alesia Lazarenka | Dina Almassova | Amália Mendes | Anna Kanellopoulou | Antoni Brosa-Rodriguez | Baiba Valkovska | Beata Wojtowicz | Bolette Pedersen | Carlos Manuel Hidalgo-Ternero | Chaya Liebeskind | Danka Jokić | Diego Alves | Eleni Triantafyllidi | Erik Velldal | Fred Philippy | Giedre Valunaite Oleskeviciene | Ieva Rizgeliene | Inguna Skadina | Irina Lobzhanidze | Isabell Stinessen Haugen | Jauza Akbar Krito | Jelena M. Marković | Johanna Monti | Josue Alejandro Sauca | Kaja Dobrovoljc Zor | Kingsley O. Ugwuanyi | Laura Rituma | Lilja Øvrelid | Maha Tufail Agro | Manzura Abjalova | Maria Chatzigrigoriou | María del Mar Sánchez Ramos | Marija Pendevska | Masoumeh Seyyedrezaei | Mehrnoush Shamsfard | Momina Ahsan | Muhammad Ahsan Riaz Khan | Nathalie Carmen Hau Norman | Nilay Erdem Ayyıldız | Nina Hosseini-Kivanani | Noémi Ligeti-Nagy | Numaan Naeem | Olha Kanishcheva | Olha Yatsyshyna | Daniil Orel | Petra Giommarelli | Petya Osenova | Radovan Garabik | Regina E. Semou | Rozane Rebechi | Salsabila Zahirah Pranida | Samia Touileb | Sanni Nimb | Sarfraz Ahmad | Sarvinoz Sharipova | Shahar Golan | Shaoxiong Ji | Sopuruchi Christian Aboh | Srdjan Sucur | Stella Markantonatou | Sussi Olsen | Vahide Tajalli | Veronika Lipp | Voula Giouli | Yelda Yeşildal Eraydın | Zahra Saaberi | Zhuohan Xie
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Dilara Torunoğlu-Selamet | Doğukan Arslan | Rodrigo Wilkens | Wei He | Doruk Eryiğit | Thomas Pickard | Adriana S. Pagano | Aline Villavicencio | Gülşen Eryiğit | Ágnes Abuczki | Aida Cardoso | Alesia Lazarenka | Dina Almassova | Amália Mendes | Anna Kanellopoulou | Antoni Brosa-Rodriguez | Baiba Valkovska | Beata Wojtowicz | Bolette Pedersen | Carlos Manuel Hidalgo-Ternero | Chaya Liebeskind | Danka Jokić | Diego Alves | Eleni Triantafyllidi | Erik Velldal | Fred Philippy | Giedre Valunaite Oleskeviciene | Ieva Rizgeliene | Inguna Skadina | Irina Lobzhanidze | Isabell Stinessen Haugen | Jauza Akbar Krito | Jelena M. Marković | Johanna Monti | Josue Alejandro Sauca | Kaja Dobrovoljc Zor | Kingsley O. Ugwuanyi | Laura Rituma | Lilja Øvrelid | Maha Tufail Agro | Manzura Abjalova | Maria Chatzigrigoriou | María del Mar Sánchez Ramos | Marija Pendevska | Masoumeh Seyyedrezaei | Mehrnoush Shamsfard | Momina Ahsan | Muhammad Ahsan Riaz Khan | Nathalie Carmen Hau Norman | Nilay Erdem Ayyıldız | Nina Hosseini-Kivanani | Noémi Ligeti-Nagy | Numaan Naeem | Olha Kanishcheva | Olha Yatsyshyna | Daniil Orel | Petra Giommarelli | Petya Osenova | Radovan Garabik | Regina E. Semou | Rozane Rebechi | Salsabila Zahirah Pranida | Samia Touileb | Sanni Nimb | Sarfraz Ahmad | Sarvinoz Sharipova | Shahar Golan | Shaoxiong Ji | Sopuruchi Christian Aboh | Srdjan Sucur | Stella Markantonatou | Sussi Olsen | Vahide Tajalli | Veronika Lipp | Voula Giouli | Yelda Yeşildal Eraydın | Zahra Saaberi | Zhuohan Xie
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Potentially idiomatic expressions (PIEs) carry meanings inherently tied to the everyday experience of a given language community. As such, they constitute an interesting challenge for assessing the linguistic (and to some extent cultural) capabilities of NLP systems. In this paper, we present XMPIE, a parallel multilingual and multimodal dataset of potentially idiomatic expressions. The dataset, containing 34 languages and over ten thousand items, allows comparative analyses of idiomatic patterns among language-specific realisations and preferences in order to gather insights about shared cultural aspects. This parallel dataset allows evaluation of language model performance for a given PIE in different languages and whether idiomatic understanding in one language can be transferred to another. Moreover, the dataset supports the study of PIEs across textual and visual modalities, to measure to what extent PIE understanding in one modality transfers or implies in understanding in another modality (text vs. image). The data was created by language experts, with both textual and visual components crafted under multilingual guidelines, and each PIE is accompanied by five images representing a spectrum from idiomatic to literal meanings, including semantically related and random distractors. The result is a high-quality benchmark for evaluating multilingual and multimodal idiomatic language understanding.
PARSEME 2.0 Multilingual Corpus of Multiword Expressions
Agata Savary | Manon Scholivet | Carlos Ramisch | Takuya Nakamura | Eric Bilinski | Sara Stymne | Voula Giouli | Stella Markantonatou | Vasile Pais | Maria Mitrofan | Louis Estève | Bruno Guillaume | Verginica Barbu Mititelu | Jaka Čibej | Roberto Díaz Hernández | Victoria Fendel | Polona Gantar | Olha Kanishcheva | Cvetana Krstev | Chaya Liebeskind | Irina Lobzhanidze | Aleksandra M. Marković | Gunta Nešpore-Bērzkalne | Adriana S. Pagano | Mehrnoush Shamsfard | Ranka Stankovic | Vahide Tajalli | Carole Tiberius | Aakanksha Padhye
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Agata Savary | Manon Scholivet | Carlos Ramisch | Takuya Nakamura | Eric Bilinski | Sara Stymne | Voula Giouli | Stella Markantonatou | Vasile Pais | Maria Mitrofan | Louis Estève | Bruno Guillaume | Verginica Barbu Mititelu | Jaka Čibej | Roberto Díaz Hernández | Victoria Fendel | Polona Gantar | Olha Kanishcheva | Cvetana Krstev | Chaya Liebeskind | Irina Lobzhanidze | Aleksandra M. Marković | Gunta Nešpore-Bērzkalne | Adriana S. Pagano | Mehrnoush Shamsfard | Ranka Stankovic | Vahide Tajalli | Carole Tiberius | Aakanksha Padhye
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We present edition 2.0 of the PARSEME multilingual corpus annotated for multiword expressions (MWEs), resulting from efforts of the PARSEME community towards universality-driven modeling of idiomaticity. With respect to previous editions, we extend the annotation scope to all syntactic MWE categories: verbal, nominal, adjectival, adverbial and functional. We cover 17 languages, of which 7 are new. The annotation process is based on cross-lingually unified guidelines, phrased as decision diagrams over linguistic tests, and a typology of 18 MWE categories. The corpus contains almost 5 million tokens, over 250,000 sentences and 140,000 MWE annotations. The applicability of the corpus is tested in baseline experiments with a prompt-based MWE identification system. Results show that generic large language models do not encode sufficient knowledge to solve the MWE identification task.
2025
Proceedings of the 21st Workshop on Multiword Expressions (MWE 2025)
Atul Kr. Ojha | Voula Giouli | Verginica Barbu Mititelu | Mathieu Constant | Gražina Korvel | A. Seza Doğruöz | Alexandre Rademaker
Proceedings of the 21st Workshop on Multiword Expressions (MWE 2025)
Atul Kr. Ojha | Voula Giouli | Verginica Barbu Mititelu | Mathieu Constant | Gražina Korvel | A. Seza Doğruöz | Alexandre Rademaker
Proceedings of the 21st Workshop on Multiword Expressions (MWE 2025)
Survey on Lexical Resources Focused on Multiword Expressions for the Purposes of NLP
Verginica Mititelu | Voula Giouli | Gražina Korvel | Chaya Liebeskind | Irina Lobzhanidze | Rusudan Makhachashvili | Stella Markantonatou | Aleksandra Markovic | Ivelina Stoyanova
Proceedings of the 21st Workshop on Multiword Expressions (MWE 2025)
Verginica Mititelu | Voula Giouli | Gražina Korvel | Chaya Liebeskind | Irina Lobzhanidze | Rusudan Makhachashvili | Stella Markantonatou | Aleksandra Markovic | Ivelina Stoyanova
Proceedings of the 21st Workshop on Multiword Expressions (MWE 2025)
Lexica of MWEs have always been a valuable resource for various NLP tasks. This paper presents the results of a comprehensive survey on multiword lexical resources that extends a previous one from 2016 to the present. We analyze a diverse set of lexica across multiple languages, reporting on aspects such as creation date, intended usage, languages covered and linguality type, content, acquisition method, accessibility, and linkage to other language resources. Our findings highlight trends in MWE lexicon development focusing on the representation level of languages. This survey aims to support future efforts in creating MWE lexica for NLP applications by identifying these gaps and opportunities.
2024
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
Archna Bhatia | Gosse Bouma | A. Seza Doğruöz | Kilian Evang | Marcos Garcia | Voula Giouli | Lifeng Han | Joakim Nivre | Alexandre Rademaker
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
Archna Bhatia | Gosse Bouma | A. Seza Doğruöz | Kilian Evang | Marcos Garcia | Voula Giouli | Lifeng Han | Joakim Nivre | Alexandre Rademaker
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
Multiword Expressions between the Corpus and the Lexicon: Universality, Idiosyncrasy, and the Lexicon-Corpus Interface
Verginica Barbu Mititelu | Voula Giouli | Kilian Evang | Daniel Zeman | Petya Osenova | Carole Tiberius | Simon Krek | Stella Markantonatou | Ivelina Stoyanova | Ranka Stanković | Christian Chiarcos
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
Verginica Barbu Mititelu | Voula Giouli | Kilian Evang | Daniel Zeman | Petya Osenova | Carole Tiberius | Simon Krek | Stella Markantonatou | Ivelina Stoyanova | Ranka Stanković | Christian Chiarcos
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
We present ongoing work towards defining a lexicon-corpus interface to serve as a benchmark in the representation of multiword expressions (of various parts of speech) in dedicated lexica and the linking of these entries to their corpus occurrences. The final aim is the harnessing of such resources for the automatic identification of multiword expressions in a text. The involvement of several natural languages aims at the universality of a solution not centered on a particular language, and also accommodating idiosyncrasies. Challenges in the lexicographic description of multiword expressions are discussed, the current status of lexica dedicated to this linguistic phenomenon is outlined, as well as the solution we envisage for creating an ecosystem of interlinked lexica and corpora containing and, respectively, annotated with multiword expressions.
UniDive: A COST Action on Universality, Diversity and Idiosyncrasy in Language Technology
Agata Savary | Daniel Zeman | Verginica Barbu Mititelu | Anabela Barreiro | Olesea Caftanatov | Marie-Catherine de Marneffe | Kaja Dobrovoljc | Gülşen Eryiğit | Voula Giouli | Bruno Guillaume | Stella Markantonatou | Nurit Melnik | Joakim Nivre | Atul Kr. Ojha | Carlos Ramisch | Abigail Walsh | Beata Wójtowicz | Alina Wróblewska
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024
Agata Savary | Daniel Zeman | Verginica Barbu Mititelu | Anabela Barreiro | Olesea Caftanatov | Marie-Catherine de Marneffe | Kaja Dobrovoljc | Gülşen Eryiğit | Voula Giouli | Bruno Guillaume | Stella Markantonatou | Nurit Melnik | Joakim Nivre | Atul Kr. Ojha | Carlos Ramisch | Abigail Walsh | Beata Wójtowicz | Alina Wróblewska
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024
This paper presents the objectives, organization and activities of the UniDive COST Action, a scientific network dedicated to universality, diversity and idiosyncrasy in language technology. We describe the objectives and organization of this initiative, the people involved, the working groups and the ongoing tasks and activities. This paper is also an pen call for participation towards new members and countries.
2023
PARSEME corpus release 1.3
Agata Savary | Cherifa Ben Khelil | Carlos Ramisch | Voula Giouli | Verginica Barbu Mititelu | Najet Hadj Mohamed | Cvetana Krstev | Chaya Liebeskind | Hongzhi Xu | Sara Stymne | Tunga Güngör | Thomas Pickard | Bruno Guillaume | Eduard Bejček | Archna Bhatia | Marie Candito | Polona Gantar | Uxoa Iñurrieta | Albert Gatt | Jolanta Kovalevskaite | Timm Lichte | Nikola Ljubešić | Johanna Monti | Carla Parra Escartín | Mehrnoush Shamsfard | Ivelina Stoyanova | Veronika Vincze | Abigail Walsh
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)
Agata Savary | Cherifa Ben Khelil | Carlos Ramisch | Voula Giouli | Verginica Barbu Mititelu | Najet Hadj Mohamed | Cvetana Krstev | Chaya Liebeskind | Hongzhi Xu | Sara Stymne | Tunga Güngör | Thomas Pickard | Bruno Guillaume | Eduard Bejček | Archna Bhatia | Marie Candito | Polona Gantar | Uxoa Iñurrieta | Albert Gatt | Jolanta Kovalevskaite | Timm Lichte | Nikola Ljubešić | Johanna Monti | Carla Parra Escartín | Mehrnoush Shamsfard | Ivelina Stoyanova | Veronika Vincze | Abigail Walsh
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)
We present version 1.3 of the PARSEME multilingual corpus annotated with verbal multiword expressions. Since the previous version, new languages have joined the undertaking of creating such a resource, some of the already existing corpora have been enriched with new annotated texts, while others have been enhanced in various ways. The PARSEME multilingual corpus represents 26 languages now. All monolingual corpora therein use Universal Dependencies v.2 tagset. They are (re-)split observing the PARSEME v.1.2 standard, which puts impact on unseen VMWEs. With the current iteration, the corpus release process has been detached from shared tasks; instead, a process for continuous improvement and systematic releases has been introduced.
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)
Archna Bhatia | Kilian Evang | Marcos Garcia | Voula Giouli | Lifeng Han | Shiva Taslimipoor
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)
Archna Bhatia | Kilian Evang | Marcos Garcia | Voula Giouli | Lifeng Han | Shiva Taslimipoor
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)
2022
Placing multi-modal, and multi-lingual Data in the Humanities Domain on the Map: the Mythotopia Geo-tagged Corpus
Voula Giouli | Anna Vacalopoulou | Nikolaos Sidiropoulos | Christina Flouda | Athanasios Doupas | Giorgos Giannopoulos | Nikos Bikakis | Vassilis Kaffes | Gregory Stainhaouer
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Voula Giouli | Anna Vacalopoulou | Nikolaos Sidiropoulos | Christina Flouda | Athanasios Doupas | Giorgos Giannopoulos | Nikos Bikakis | Vassilis Kaffes | Gregory Stainhaouer
Proceedings of the Thirteenth Language Resources and Evaluation Conference
The paper gives an account of an infrastructure that will be integrated into a platform aimed at providing a multi-faceted experience to visitors of Northern Greece using mythology as a starting point. This infrastructure comprises a multi-lingual and multi-modal corpus (i.e., a corpus of textual data supplemented with images, and video) that belongs to the humanities domain along with a dedicated database (content management system) with advanced indexing, linking and search functionalities. We will present the corpus itself focusing on the content, the methodology adopted for its development, and the steps taken towards rendering it accessible via the database in a way that also facilitates useful visualizations. In this context, we tried to address three main challenges: (a) to add a novel annotation layer, namely geotagging, (b) to ensure the long-term maintenance of and accessibility to the highly heterogeneous primary data – even after the life cycle of the current project – by adopting a metadata schema that is compatible to existing standards; and (c) to render the corpus a useful resource to scholarly research in the digital humanities by adding a minimum set of linguistic annotations.
2020
Edition 1.2 of the PARSEME Shared Task on Semi-supervised Identification of Verbal Multiword Expressions
Carlos Ramisch | Agata Savary | Bruno Guillaume | Jakub Waszczuk | Marie Candito | Ashwini Vaidya | Verginica Barbu Mititelu | Archna Bhatia | Uxoa Iñurrieta | Voula Giouli | Tunga Güngör | Menghan Jiang | Timm Lichte | Chaya Liebeskind | Johanna Monti | Renata Ramisch | Sara Stymne | Abigail Walsh | Hongzhi Xu
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
Carlos Ramisch | Agata Savary | Bruno Guillaume | Jakub Waszczuk | Marie Candito | Ashwini Vaidya | Verginica Barbu Mititelu | Archna Bhatia | Uxoa Iñurrieta | Voula Giouli | Tunga Güngör | Menghan Jiang | Timm Lichte | Chaya Liebeskind | Johanna Monti | Renata Ramisch | Sara Stymne | Abigail Walsh | Hongzhi Xu
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
We present edition 1.2 of the PARSEME shared task on identification of verbal multiword expressions (VMWEs). Lessons learned from previous editions indicate that VMWEs have low ambiguity, and that the major challenge lies in identifying test instances never seen in the training data. Therefore, this edition focuses on unseen VMWEs. We have split annotated corpora so that the test corpora contain around 300 unseen VMWEs, and we provide non-annotated raw corpora to be used by complementary discovery methods. We released annotated and raw corpora in 14 languages, and this semi-supervised challenge attracted 7 teams who submitted 9 system results. This paper describes the effort of corpus creation, the task design, and the results obtained by the participating systems, especially their performance on unseen expressions.
Greek within the Global FrameNet Initiative: Challenges and Conclusions so far
Voula Giouli | Vera Pilitsidou | Hephaestion Christopoulos
Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet
Voula Giouli | Vera Pilitsidou | Hephaestion Christopoulos
Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet
Large coverage lexical resources that bear deep linguistic information have always been considered useful for many natural language processing (NLP) applications including Machine Translation (MT). In this respect, Frame-based resources have been developed for many languages following Frame Semantics and the Berkeley FrameNet project. However, to a great extent, all those efforts have been kept fragmented. Consequentially, the Global FrameNet initiative has been conceived of as a joint effort to bring together FrameNets in different languages. The proposed paper is aimed at describing ongoing work towards developing the Greek (EL) counterpart of the Global FrameNet and our efforts to contribute to the Shared Annotation Task. In the paper, we will elaborate on the annotation methodology employed, the current status and progress made so far, as well as the problems raised during annotation.
2018
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Carlos Ramisch | Silvio Ricardo Cordeiro | Agata Savary | Veronika Vincze | Verginica Barbu Mititelu | Archna Bhatia | Maja Buljan | Marie Candito | Polona Gantar | Voula Giouli | Tunga Güngör | Abdelati Hawwari | Uxoa Iñurrieta | Jolanta Kovalevskaitė | Simon Krek | Timm Lichte | Chaya Liebeskind | Johanna Monti | Carla Parra Escartín | Behrang QasemiZadeh | Renata Ramisch | Nathan Schneider | Ivelina Stoyanova | Ashwini Vaidya | Abigail Walsh
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
Carlos Ramisch | Silvio Ricardo Cordeiro | Agata Savary | Veronika Vincze | Verginica Barbu Mititelu | Archna Bhatia | Maja Buljan | Marie Candito | Polona Gantar | Voula Giouli | Tunga Güngör | Abdelati Hawwari | Uxoa Iñurrieta | Jolanta Kovalevskaitė | Simon Krek | Timm Lichte | Chaya Liebeskind | Johanna Monti | Carla Parra Escartín | Behrang QasemiZadeh | Renata Ramisch | Nathan Schneider | Ivelina Stoyanova | Ashwini Vaidya | Abigail Walsh
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multiword expressions. We present the annotation methodology, focusing on changes from last year’s shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation settings. Corpora were created for 20 languages, which are also briefly discussed. We report organizational principles behind the shared task and the evaluation metrics employed for ranking. The 17 participating systems, their methods and obtained results are also presented and analysed.
2017
The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Agata Savary | Carlos Ramisch | Silvio Cordeiro | Federico Sangati | Veronika Vincze | Behrang QasemiZadeh | Marie Candito | Fabienne Cap | Voula Giouli | Ivelina Stoyanova | Antoine Doucet
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
Agata Savary | Carlos Ramisch | Silvio Cordeiro | Federico Sangati | Veronika Vincze | Behrang QasemiZadeh | Marie Candito | Fabienne Cap | Voula Giouli | Ivelina Stoyanova | Antoine Doucet
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
Multiword expressions (MWEs) are known as a “pain in the neck” for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one’s heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as “words with spaces”. We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-million-word annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems.
2014
Encoding MWEs in a conceptual lexicon
Aggeliki Fotopoulou | Stella Markantonatou | Voula Giouli
Proceedings of the 10th Workshop on Multiword Expressions (MWE)
Aggeliki Fotopoulou | Stella Markantonatou | Voula Giouli
Proceedings of the 10th Workshop on Multiword Expressions (MWE)
Linguistically motivated Language Resources for Sentiment Analysis
Voula Giouli | Aggeliki Fotopoulou
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing
Voula Giouli | Aggeliki Fotopoulou
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing
2009
A Web-Enabled and Speech-Enhanced Parallel Corpus of Greek-Bulgarian Cultural Texts
Voula Giouli | Nikos Glaros | Kiril Simov | Petya Osenova
Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education (LaTeCH – SHELT&R 2009)
Voula Giouli | Nikos Glaros | Kiril Simov | Petya Osenova
Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education (LaTeCH – SHELT&R 2009)
2008
Building a Greek corpus for Textual Entailment
Evi Marzelou | Maria Zourari | Voula Giouli | Stelios Piperidis
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Evi Marzelou | Maria Zourari | Voula Giouli | Stelios Piperidis
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
The paper reports on completed work aimed at the creation of a resource, namely, the Greek Textual Entailment Corpus (GTEC) that is appropriate for guiding training and evaluation of a system that recognizes Textual Entailment in Greek texts. The corpus of textual units was collected in view of a range of NLP applications, where semantic interpretation is of paramount importance, and it was manually annotated at the level of Textual Entailment. Moreover, a number of linguistic annotations were also integrated that were deemed useful for prospect system developers. The critical issue was the development of a final resource that is re-usable and adaptable to different NLP systems, in order to either enhance their accuracy or to evaluate their output. We are hereby focusing on the methodological issues underpinning data selection and annotation. An initial approach towards the development of a system catering for the automatic Recognition of Textual Entailment in Greek is also presented and preliminary results are reported.
2006
Multi-domain Multi-lingual Named Entity Recognition: Revisiting & Grounding the resources issue
Voula Giouli | Alexis Konstandinidis | Elina Desypri | Harris Papageorgiou
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Voula Giouli | Alexis Konstandinidis | Elina Desypri | Harris Papageorgiou
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
The paper reports on the development methodology of a system aimed at multi-domain multi-lingual recognition and classification of names in texts, the focus being on the linguistic resources used for training and testing purposes. The corpus presented here has been collected and annotated in the framework of different projects the critical issue being the development of a final resource that is homogenous, re-usable and adaptable to different domains and languages with a view to robust multi-domain and multi-lingual NERC.
Language Resources Production Models: the Case of the INTERA Multilingual Corpus and Terminology
Maria Gavrilidou | Penny Labropoulou | Stelios Piperidis | Voula Giouli | Nicoletta Calzolari | Monica Monachini | Claudia Soria | Khalid Choukri
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Maria Gavrilidou | Penny Labropoulou | Stelios Piperidis | Voula Giouli | Nicoletta Calzolari | Monica Monachini | Claudia Soria | Khalid Choukri
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This paper reports on the multilingual Language Resources (MLRs), i.e. parallel corpora and terminological lexicons for less widely digitally available languages, that have been developed in the INTERA project and the methodology adopted for their production. Special emphasis is given to the reality factors that have influenced the MLRs development approach and their final constitution. Building on the experience gained in the project, a production model has been elaborated, suggesting ways and techniques that can be exploited in order to improve LRs production taking into account realistic issues.
2004
Building Parallel Corpora for eContent Professionals
M. Gavrilidou | P. Labropoulou | E. Desipri | V. Giouli | V. Antonopoulos | S. Piperidis
Proceedings of the Workshop on Multilingual Linguistic Resources
M. Gavrilidou | P. Labropoulou | E. Desipri | V. Giouli | V. Antonopoulos | S. Piperidis
Proceedings of the Workshop on Multilingual Linguistic Resources
2002
Multi-level XML-based Corpus Annotation
Harris Papageorgiou | Prokopis Prokopidis | Voula Giouli | Iason Demiros | Alexis Konstantinidis | Stelios Piperidis
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)
Harris Papageorgiou | Prokopis Prokopidis | Voula Giouli | Iason Demiros | Alexis Konstantinidis | Stelios Piperidis
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)
2000
A Robust Parser for Unrestricted Greek Text
Sotiris Boutsis | Prokopis Prokopidis | Voula Giouli | Stelios Piperidis
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)
Sotiris Boutsis | Prokopis Prokopidis | Voula Giouli | Stelios Piperidis
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)
Search
Fix author
Co-authors
- Stelios Piperidis 7
- Verginica Barbu Mititelu 6
- Chaya Liebeskind 6
- Stella Markantonatou 6
- Carlos Ramisch 6
- Agata Savary 6
- Archna Bhatia 5
- Ivelina Stoyanova 5
- Marie Candito 4
- Bruno Guillaume 4
- Johanna Monti 4
- Harris Papageorgiou 4
- Abigail Walsh 4
- Kaja Dobrovoljc 3
- Kilian Evang 3
- Polona Gantar 3
- Tunga Gungor 3
- Uxoa Iñurrieta 3
- Timm Lichte 3
- Irina Lobzhanidze 3
- Petya Osenova 3
- Prokopis Prokopidis 3
- Mehrnoush Shamsfard 3
- Sara Stymne 3
- Veronika Vincze 3
- Sotiris Boutsis 2
- Silvio Cordeiro 2
- Iason Demiros 2
- Elina Desipri 2
- A. Seza Doğruöz 2
- Gülşen Eryiğit 2
- Aggeliki Fotopoulou 2
- Marcos Garcia 2
- Maria Gavrilidou 2
- Lifeng Han 2
- Olha Kanishcheva 2
- Alexis Konstantinidis 2
- Gražina Korvel 2
- Jolanta Kovalevskaitė 2
- Simon Krek 2
- Cvetana Krstev 2
- Penny Labropoulou 2
- Joakim Nivre 2
- Atul Kr. Ojha 2
- Adriana Silvina Pagano 2
- Vasile Pais 2
- Carla Parra Escartín 2
- Thomas Pickard 2
- Behrang QasemiZadeh 2
- Alexandre Rademaker 2
- Renata Ramisch 2
- Vahide Tajalli 2
- Carole Tiberius 2
- Ashwini Vaidya 2
- Beata Wójtowicz 2
- Hongzhi Xu 2
- Daniel Zeman 2
- Manzura Abjalova 1
- Sopuruchi Christian Aboh 1
- Ágnes Abuczki 1
- Maha Tufail Agro 1
- Sarfraz Ahmad 1
- Momina Ahsan 1
- Dina Almassova 1
- Diego Alves 1
- V. Antonopoulos 1
- Doğukan Arslan 1
- Verginica Barbu Mititelu 1
- Anabela Barreiro 1
- Eduard Bejček 1
- Chérifa Ben Khelil 1
- Nikos Bikakis 1
- Eric Bilinski 1
- Terra Blevins 1
- Gosse Bouma 1
- Maja Buljan 1
- Olesea Caftanatov 1
- Nicoletta Calzolari 1
- Fabienne Cap 1
- Aida Cardoso 1
- Maria Chatzigrigoriou 1
- Christian Chiarcos 1
- Khalid Choukri 1
- Hephaestion Christopoulos 1
- Matthieu Constant 1
- Antoine Doucet 1
- Athanasios Doupas 1
- Roberto Díaz Hernández 1
- Nilay Erdem Ayyıldız 1
- Doruk Eryiğit 1
- Louis Estève 1
- Victoria Fendel 1
- Christina Flouda 1
- Radovan Garabik 1
- Albert Gatt 1
- Xenophon Gialis 1
- Giorgos Giannopoulos 1
- Petra Giommarelli 1
- Nikos Glaros 1
- Shahar Golan 1
- Hila Gonen 1
- Najet Hadj Mohamed 1
- Isabell Stinessen Haugen 1
- Abdelati Hawwari 1
- Wei He 1
- Carlos Manuel Hidalgo-Ternero 1
- Nina Hosseini-Kivanani 1
- Eugene Jang 1
- Shaoxiong Ji 1
- Menghan Jiang 1
- Danka Jokić 1
- Vassilis Kaffes 1
- Anna Kanellopoulou 1
- Jun Kevin 1
- Muhammad Ahsan Riaz Khan 1
- Eungseo Kim 1
- Jauza Akbar Krito 1
- Alesia Lazarenka 1
- Maria Liakata 1
- Noémi Ligeti-Nagy 1
- Veronika Lipp 1
- Nikola Ljubešić 1
- Rusudan Makhachashvili 1
- Aleksandra Markovic 1
- Jelena M. Marković 1
- Aleksandra M. Marković 1
- Evi Marzelou 1
- Stephen Mayhew 1
- Nurit Melnik 1
- Amália Mendes 1
- Shachar Mirkin 1
- Verginica Mititelu 1
- Maria Mitrofan 1
- Monica Monachini 1
- Numaan Naeem 1
- Takuya Nakamura 1
- Gunta Nešpore-Bērzkalne 1
- Sanni Nimb 1
- Nathalie Carmen Hau Norman 1
- Sussi Olsen 1
- Daniil Orel 1
- Aakanksha Padhye 1
- Bolette Sandford Pedersen 1
- Marija Pendevska 1
- Fred Philippy 1
- Vera Pilitsidou 1
- Yuval Pinter 1
- Salsabila Zahirah Pranida 1
- María Del Mar Sánchez Ramos 1
- Rozane Rebechi 1
- Laura Rituma 1
- Ieva Rizgeliene 1
- Antoni Brosa Rodríguez 1
- Zahra Saaberi 1
- Federico Sangati 1
- Josue Alejandro Sauca 1
- Nathan Schneider 1
- Manon Scholivet 1
- Regina E. Semou 1
- Jeongyeon Seo 1
- Masoumeh Seyyedrezaei 1
- Sarvinoz Sharipova 1
- Nikolaos Sidiropoulos 1
- Kiril Simov 1
- Inguna Skadina 1
- Claudia Soria 1
- Gregory Stainhauer 1
- Ranka Stankovic 1
- Ranka Stanković 1
- Srdjan Sucur 1
- Marek Suppa 1
- Shiva Taslimipoor 1
- Dilara Torunoğlu-Selamet 1
- Samia Touileb 1
- Eleni Triantafyllidi 1
- Kingsley O. Ugwuanyi 1
- Anna Vacalopoulou 1
- Baiba Valkovska 1
- Giedre Valunaite Oleskeviciene 1
- Erik Velldal 1
- Aline Villavicencio 1
- Jakub Waszczuk 1
- Rodrigo Wilkens 1
- Alina Wróblewska 1
- Zhuohan Xie 1
- Olha Yatsyshyna 1
- Yelda Yeşildal Eraydın 1
- Maria Zourari 1
- Marie-Catherine de Marneffe 1
- Lilja Øvrelid 1
- Jaka Čibej 1