Jelena Kallas
2026
Using LLMs to Extract Instances of Schematic Constructions from Unannotated L2 Learner Corpora
Jelena Kallas | Ahto Kiil | Heete Sahkai | Geda Paulsen | Kertu Saul
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Jelena Kallas | Ahto Kiil | Heete Sahkai | Geda Paulsen | Kertu Saul
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Our previous study found that generative LLMs can be successfully used to identify instances of schematic constructions (as defined in Construction Grammar) in unannotated L1 corpus data. This study tests the applicability of LLMs to also identify instances of constructions in unannotated L2 data. L2 learner corpora are notoriously difficult to annotate and query since they contain errors. Using LLMs can thus simplify the retrieval of construction data from L2 corpora. The identification of instances of constructions in L2 learner data has many possible uses in pedagogical applications of Construction Grammar and constructicography, like the identification of error-prone (properties of) constructions and the distribution of constructional instances across CEFR levels. Using the Estonian Nominal Quantifier Construction as the example construction and an Estonian CEFR-graded learner corpus as the source of L2 data, we tested several prompts and several models (OpenAI’s o3-mini, o3, gpt-5-mini and gpt-5, Google DeepMind’s Gemini Flash 2.5, Anthropic’s Claude Sonnet 4.5 and Opus 4.1). We found that the best model, gpt-5, achieved F1-scores from 0.90 to 0.96, depending on the level of detail of the prompt.
2025
Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning
Ricardo Muñoz Sánchez | David Alfter | Elena Volodina | Jelena Kallas
Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning
Ricardo Muñoz Sánchez | David Alfter | Elena Volodina | Jelena Kallas
Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning
2024
Leveraging Domain Corpora for Enhanced Terminology: The Case of Estonian-English Remote Sensing Termbase
Liisi Jakobson | Jelena Kallas | Erko Jakobson
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Liisi Jakobson | Jelena Kallas | Erko Jakobson
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
This article addresses methodological issues related to developing domain corpora and a terminological database from scratch. We present an ongoing project focused on creating an Estonian-English Remote Sensing Termbase. First, we describe the compilation process of the Estonian Remote Sensing Corpus 2022 , which served as the primary data source for the termbase. The corpus was compiled by crawling the web and adding files using the Corpus Query System Sketch Engine (Kilgarriff et al., 2004). In the next step, we employed the Term Extraction module (Kilgarriff et al., 2014; Fišer et al., 2016; Blahuš et al., 2023) to identify terms, which were subsequently registered in the Estonian Remote Sensing Termbase using the Dictionary Writing System Ekilex (Tavast et al., 2018). For each term, we provided definitions, variants, and usage contexts. In the final stage, remote sensing experts reviewed and edited the terms, their variants, and usage contexts. Finally, we provide insights and outline directions for future work in this area.
2023
XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs
Federico Martelli | Andrei Stefan Bejgu | Cesare Campagnano | Jaka Čibej | Rute Costa | Apolonija Gantar | Jelena Kallas | Svetla Peneva Koeva | Kristina Koppel | Simon Krek | Margit Langemets | Veronika Lipp | Sanni Nimb | Sussi Olsen | Bolette Sanford Pedersen | Valeria Quochi | Ana Salgado | László Simon | Carole Tiberius | Rafael-J Ureña-Ruiz | Roberto Navigli
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)
Federico Martelli | Andrei Stefan Bejgu | Cesare Campagnano | Jaka Čibej | Rute Costa | Apolonija Gantar | Jelena Kallas | Svetla Peneva Koeva | Kristina Koppel | Simon Krek | Margit Langemets | Veronika Lipp | Sanni Nimb | Sussi Olsen | Bolette Sanford Pedersen | Valeria Quochi | Ana Salgado | László Simon | Carole Tiberius | Rafael-J Ureña-Ruiz | Roberto Navigli
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)
2021
Estonian as a Second Language Teacher’s Tools
Tiiu Üksik | Jelena Kallas | Kristina Koppel | Katrin Tsepelina | Raili Pool
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications
Tiiu Üksik | Jelena Kallas | Kristina Koppel | Katrin Tsepelina | Raili Pool
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications
The paper presents the results of the project “Teacher’s Tools” (et Õpetaja tööriistad) published as a subpage of the new language portal Sõnaveeb developed by the Institute of the Estonian Language. The toolbox includes four modules: vocabulary, grammar, communicative language activities and text evaluation. The tools are aimed to help teachers and specialists of Estonian as a second language plan courses, create new educational materials, exercises and tests based on CEFR level descriptions.
2020
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi | John P. McCrae | Sanni Nimb | Fahad Khan | Monica Monachini | Bolette S. Pedersen | Thierry Declerck | Tanja Wissik | Andrea Bellandi | Irene Pisani | Thomas Troelsgård | Sussi Olsen | Simon Krek | Veronika Lipp | Tamás Váradi | László Simon | András Győrffy | Carole Tiberius | Tanneke Schoonheim | Yifat Ben Moshe | Maya Rudich | Raya Abu Ahmad | Dorielle Lonke | Kira Kovalenko | Margit Langemets | Jelena Kallas | Oksana Dereza | Theodorus Fransen | David Cillessen | David Lindemann | Mikel Alonso | Ana Salgado | José Luis Sancho | Rafael-J. Ureña-Ruiz | Jordi Porta Zamorano | Kiril Simov | Petya Osenova | Zara Kancheva | Ivaylo Radev | Ranka Stanković | Andrej Perdih | Dejan Gabrovšek
Proceedings of the Twelfth Language Resources and Evaluation Conference
Sina Ahmadi | John P. McCrae | Sanni Nimb | Fahad Khan | Monica Monachini | Bolette S. Pedersen | Thierry Declerck | Tanja Wissik | Andrea Bellandi | Irene Pisani | Thomas Troelsgård | Sussi Olsen | Simon Krek | Veronika Lipp | Tamás Váradi | László Simon | András Győrffy | Carole Tiberius | Tanneke Schoonheim | Yifat Ben Moshe | Maya Rudich | Raya Abu Ahmad | Dorielle Lonke | Kira Kovalenko | Margit Langemets | Jelena Kallas | Oksana Dereza | Theodorus Fransen | David Cillessen | David Lindemann | Mikel Alonso | Ana Salgado | José Luis Sancho | Rafael-J. Ureña-Ruiz | Jordi Porta Zamorano | Kiril Simov | Petya Osenova | Zara Kancheva | Ivaylo Radev | Ranka Stanković | Andrej Perdih | Dejan Gabrovšek
Proceedings of the Twelfth Language Resources and Evaluation Conference
Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages and resources and focuses on the more challenging task of linking general-purpose language. We believe that our data will pave the way for further advances in alignment and evaluation of word senses by creating new solutions, particularly those notoriously requiring data such as neural networks. Our resources are publicly available at https://github.com/elexis-eu/MWSA.
Search
Fix author
Co-authors
- Kristina Koppel 2
- Simon Krek 2
- Margit Langemets 2
- Veronika Lipp 2
- Sanni Nimb 2
- Sussi Olsen 2
- Ana Salgado 2
- László Simon 2
- Carole Tiberius 2
- Rafael-J. Ureña-Ruiz 2
- Raya Abu Ahmad 1
- Sina Ahmadi 1
- David Alfter 1
- Mikel Alonso 1
- Andrei Stefan Bejgu 1
- Andrea Bellandi 1
- Yifat Ben Moshe 1
- Cesare Campagnano 1
- David Cillessen 1
- Rute Costa 1
- Thierry Declerck 1
- Oksana Dereza 1
- Theodorus Fransen 1
- Dejan Gabrovšek 1
- Apolonija Gantar 1
- András Győrffy 1
- Liisi Jakobson 1
- Erko Jakobson 1
- Zara Kancheva 1
- Fahad Khan 1
- Ahto Kiil 1
- Svetla Peneva Koeva 1
- Kira Kovalenko 1
- David Lindemann 1
- Dorielle Lonke 1
- Federico Martelli 1
- John Philip McCrae 1
- Monica Monachini 1
- Ricardo Muñoz Sánchez 1
- Roberto Navigli 1
- Petya Osenova 1
- Geda Paulsen 1
- Bolette Sandford Pedersen 1
- Andrej Perdih 1
- Irene Pisani 1
- Raili Pool 1
- Valeria Quochi 1
- Ivaylo Radev 1
- Maya Rudich 1
- Heete Sahkai 1
- José-Luis Sancho 1
- Bolette Sanford Pedersen 1
- Kertu Saul 1
- Tanneke Schoonheim 1
- Kiril Simov 1
- Ranka Stanković 1
- Thomas Troelsgård 1
- Katrin Tsepelina 1
- Elena Volodina 1
- Tamás Váradi 1
- Tanja Wissik 1
- Jordi Porta Zamorano 1
- Tiiu Üksik 1
- Jaka Čibej 1