Diego Rossini
2026
Binary Token-Level Classification with DeBERTa for All-Type MWE Identification: A Lightweight Approach with Linguistic Enhancement
Diego Rossini | Lonneke Van Der Plas
Findings of the Association for Computational Linguistics: EACL 2026
Diego Rossini | Lonneke Van Der Plas
Findings of the Association for Computational Linguistics: EACL 2026
We present a comprehensive approach for multiword expression (MWE) identification that combines binary token-level classification, linguistic feature integration, and data augmentation. Our DeBERTa-v3-large model achieves 69.8% F1 on the CoAM dataset, surpassing the best results (Qwen-72B, 57.8% F1) on this dataset by 12 points while using 165 times fewer parameters. We achieve this performance by (1) reformulating detection as binary token-level START/END/INSIDE classification rather than span-based prediction, (2) incorporating NP chunking and dependency features that help discontinuous and NOUN-type MWEs identification, and (3) applying oversampling that addresses severe class imbalance in the training data. We confirm the generalization of our method on the STREUSLE dataset, achieving 78.9% F1. These results demonstrate that carefully designed smaller models can substantially outperform LLMs on structured NLP tasks, with important implications for resource-constrained deployments.
The MultiplEYE Text Corpus: Towards a Diverse and Ever-Expanding Multilingual Text Corpus
Ramunė Kasperė | Anna Bondar | Sergiu Nisioi | Maja Stegenwallner-Schütz | Hanne B. Søndergaard Knudsen | Ana Matić | Eva Pavlinušić Vilus | Dorota Klimek-Jankowska | Chiara Tschirner | Not Battesta Soliva | Deborah N. Jakobi | Cui Ding | Dima Abu Romi | Cengiz Acarturk | Matilda Agdler | Anton Marius Alexandru | Mohd Faizan Ansari | Annalisa Arcidiacono | Elizabete Ausma Velta Barisa | Ana Bautista | Lisa Beinborn | Yevgeni Berzak | Nedeljka Bjelanović | Anna Isabelle Bothmann | Jan Brasser | Caterina Cacioli | Anila Çepani | Ilze Ceple | Adelina Cerpja | Dalí Chirino | Jan Chromý | Alessandro Corona Mendozza | Iria de-Dios-Flores | Nazik Dinçtopal Deniz | Ana Došen | Kristian Elersič | Inmaculada Fajardo | Zigmunds Freibergs | Angelina Ganebnaya | Shan Gao | Jéssica Gomes | Annjo Klungervik Greenall | Alba Haveriku | Miao He | Anamaria Hodivoianu | Yu-Yin Hsu | Amanda Isaksen | Andreia Janeiro | Kristine Jensen de López | Aleksandar Jevremovic | Vojislav Jovanovic | Hanna Kędzierska | Nik Kharlamov | Sara Kosutar | Nelda Kote | Vanja Kovic | Izabela Krejtz | Thyra Krosness | Oleksandra Kuvshynova | Eilam Lavy | Ella Lion | Marta Łockiewicz | Kaidi Lõo | Paula Luegi | Mircea Mihai Marin | Clara Martin | Svitlana Matvieieva | Diane C. Mézière | Xavier Mínguez-López | Valeriia Modina | Jurgita Motiejūnienė | Marie-Luise Müller | Tolgonai Nasipbek kyzy | Jamal Abdul Nasir | Johanne S. K. Nedergård | Ayşegül Özkan | Patrizia Paggio | Marijan Palmović | Maria Christina Panagiotopoulou | Alberto Parola | Helena Pérez | Klaudia Petersen | Anja Podlesek | Eva Pospíšilová | Marta Praulina | Mikuláš Preininger | Loredana Pungă | Diego Rossini | Špela Rot | Habib Sani Yahaya | Irina A. Sekerina | Anne Gabija Skadina | Jordi Solé-Casals | Lonneke van der Plas | Saara M. Varjopuro | Spyridoula Varlokosta | João Veríssimo | Oskari Juhapekka Virtanen | Nemanja Vračar | Mila Vulchanova | Ahmad Mustapha Wali | Peizheng Wu | Nilgün Yücel | Stefan Frank | Nora Hollenstein | Lena Jäger
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Ramunė Kasperė | Anna Bondar | Sergiu Nisioi | Maja Stegenwallner-Schütz | Hanne B. Søndergaard Knudsen | Ana Matić | Eva Pavlinušić Vilus | Dorota Klimek-Jankowska | Chiara Tschirner | Not Battesta Soliva | Deborah N. Jakobi | Cui Ding | Dima Abu Romi | Cengiz Acarturk | Matilda Agdler | Anton Marius Alexandru | Mohd Faizan Ansari | Annalisa Arcidiacono | Elizabete Ausma Velta Barisa | Ana Bautista | Lisa Beinborn | Yevgeni Berzak | Nedeljka Bjelanović | Anna Isabelle Bothmann | Jan Brasser | Caterina Cacioli | Anila Çepani | Ilze Ceple | Adelina Cerpja | Dalí Chirino | Jan Chromý | Alessandro Corona Mendozza | Iria de-Dios-Flores | Nazik Dinçtopal Deniz | Ana Došen | Kristian Elersič | Inmaculada Fajardo | Zigmunds Freibergs | Angelina Ganebnaya | Shan Gao | Jéssica Gomes | Annjo Klungervik Greenall | Alba Haveriku | Miao He | Anamaria Hodivoianu | Yu-Yin Hsu | Amanda Isaksen | Andreia Janeiro | Kristine Jensen de López | Aleksandar Jevremovic | Vojislav Jovanovic | Hanna Kędzierska | Nik Kharlamov | Sara Kosutar | Nelda Kote | Vanja Kovic | Izabela Krejtz | Thyra Krosness | Oleksandra Kuvshynova | Eilam Lavy | Ella Lion | Marta Łockiewicz | Kaidi Lõo | Paula Luegi | Mircea Mihai Marin | Clara Martin | Svitlana Matvieieva | Diane C. Mézière | Xavier Mínguez-López | Valeriia Modina | Jurgita Motiejūnienė | Marie-Luise Müller | Tolgonai Nasipbek kyzy | Jamal Abdul Nasir | Johanne S. K. Nedergård | Ayşegül Özkan | Patrizia Paggio | Marijan Palmović | Maria Christina Panagiotopoulou | Alberto Parola | Helena Pérez | Klaudia Petersen | Anja Podlesek | Eva Pospíšilová | Marta Praulina | Mikuláš Preininger | Loredana Pungă | Diego Rossini | Špela Rot | Habib Sani Yahaya | Irina A. Sekerina | Anne Gabija Skadina | Jordi Solé-Casals | Lonneke van der Plas | Saara M. Varjopuro | Spyridoula Varlokosta | João Veríssimo | Oskari Juhapekka Virtanen | Nemanja Vračar | Mila Vulchanova | Ahmad Mustapha Wali | Peizheng Wu | Nilgün Yücel | Stefan Frank | Nora Hollenstein | Lena Jäger
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We present the MultiplEYE Text Corpus, a large-scale, document-level, multi-parallel resource designed to advance cross-linguistic research on reading and language processing. The corpus provides paragraph-level alignment for texts in 39 languages spanning seven language families and seven scripts. Unlike many existing multilingual corpora, a substantial number of documents were originally written in languages other than English, reducing English-centric bias and supporting more typologically diverse investigations. The texts are carefully selected to balance linguistic richness with experimental feasibility, particularly for eye-tracking-while-reading studies. Developed within a multi-lab initiative, the MultiplEYE Text Corpus follows unified translation, alignment, and experimental design guidelines to ensure cross-linguistic comparability. Its inclusion of texts varying in type and difficulty enables research on discourselevel processing, genre effects, and individual differences across a wide range of languages. The text corpus and accompanying metadata provide a robust foundation for multilingual psycholinguistic and computational modeling research. Data and materials are publicly available at https://doi.org/10.23668/psycharchives.21750.
2024
A Modal Sense Classifier for the French Modal Verb Pouvoir
Anna Colli | Diego Rossini | Delphine Battistelli
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)
Anna Colli | Diego Rossini | Delphine Battistelli
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)
In this paper we address the problem of modal sense classification for the French modal verb pouvoir in a transcribed spoken corpus. To the best of our knowledge, no studies have focused on this task in French. We fine-tuned various BERT-based models for French in order to determine which one performed best. It was found that the Flaubert-base-cased model was the most effective (F1-score of 0.94) and that the most frequent categories in our corpus were material possibility and ability, which are both part of the more global alethic category.
Search
Fix author
Co-authors
- Lonneke van der Plas 2
- Jamal Abdul Nasir 1
- Dima Abu Romi 1
- Cengiz Acarturk 1
- Matilda Agdler 1
- Anton Marius Alexandru 1
- Mohd Faizan Ansari 1
- Annalisa Arcidiacono 1
- Hanne B. Søndergaard Knudsen 1
- Elizabete Ausma Velta Barisa 1
- Not Battesta Soliva 1
- Delphine Battistelli 1
- Ana Bautista 1
- Lisa Beinborn 1
- Yevgeni Berzak 1
- Nedeljka Bjelanović 1
- Anna Bondar 1
- Anna Isabelle Bothmann 1
- Jan Brasser 1
- Caterina Cacioli 1
- Ilze Ceple 1
- Adelina Cerpja 1
- Dalí Chirino 1
- Jan Chromý 1
- Anna Colli 1
- Alessandro Corona Mendozza 1
- Nazik Dinctopal Deniz 1
- Cui Ding 1
- Ana Došen 1
- Kristian Elersič 1
- Inmaculada Fajardo 1
- Stefan L. Frank 1
- Zigmunds Freibergs 1
- Angelina Ganebnaya 1
- Shan Gao 1
- Jéssica Gomes 1
- Annjo Klungervik Greenall 1
- Alba Haveriku 1
- Miao He 1
- Anamaria Hodivoianu 1
- Nora Hollenstein 1
- Yu-Yin Hsu 1
- Amanda Isaksen 1
- Deborah N. Jakobi 1
- Andreia Janeiro 1
- Kristine Jensen de López 1
- Aleksandar Jevremovic 1
- Vojislav Jovanovic 1
- Lena Ann Jäger 1
- Ramunė Kasperė 1
- Nik Kharlamov 1
- Dorota Klimek-Jankowska 1
- Nelda Kote 1
- Vanja Kovic 1
- Sara Košutar 1
- Izabela Krejtz 1
- Thyra Krosness 1
- Oleksandra Kuvshynova 1
- Hanna Kędzierska 1
- Eilam Lavy 1
- Ella Lion 1
- Paula Luegi 1
- Kaidi Lõo 1
- Mircea Mihai Marin 1
- Clara Martin 1
- Ana Matić 1
- Svitlana Matvieieva 1
- Valeriia Modina 1
- Jurgita Motiejūnienė 1
- Diane C. Mézière 1
- Xavier Mínguez-López 1
- Marie-Luise Müller 1
- Tolgonai Nasipbek kyzy 1
- Johanne S. K. Nedergård 1
- Sergiu Nisioi 1
- Patrizia Paggio 1
- Marijan Palmović 1
- Maria Christina Panagiotopoulou 1
- Alberto Parola 1
- Eva Pavlinušić Vilus 1
- Klaudia Petersen 1
- Anja Podlesek 1
- Eva Pospíšilová 1
- Marta Praulina 1
- Mikuláš Preininger 1
- Loredana Pungă 1
- Helena Pérez 1
- Špela Rot 1
- Habib Sani Yahaya 1
- Irina A. Sekerina 1
- Anne Gabija Skadina 1
- Jordi Solé-Casals 1
- Maja Stegenwallner-Schütz 1
- Chiara Tschirner 1
- Saara M. Varjopuro 1
- Spyridoula Varlokosta 1
- João Veríssimo 1
- Oskari Juhapekka Virtanen 1
- Nemanja Vračar 1
- Mila Vulchanova 1
- Ahmad Mustapha Wali 1
- Peizheng Wu 1
- Nilgün Yücel 1
- Iria de-Dios-Flores 1
- Anila Çepani 1
- Ayşegül Özkan 1
- Marta Łockiewicz 1