Jasper Degraeuwe


2025

pdf bib
You Shall Know a Word’s Difficulty by the Family It Keeps: Word Family Features in Personalised Word Difficulty Classifiers for L2 Spanish
Jasper Degraeuwe
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)

Designing vocabulary learning activities for foreign/second language (L2) learners highly depends on the successful identification of difficult words. In this paper, we present a novel personalised word difficulty classifier for L2 Spanish, using the LexComSpaL2 corpus as training data and a BiLSTM model as the architecture. We train a base version (using the original LexComSpaL2 data) and a word family version of the classifier (adding word family knowledge as an extra feature). The base version obtains reasonably good performance (F1 = 0.53) and shows weak positive predictive power (φ = 0.32), underlining the potential of automated methods in determining vocabulary difficulty for individual L2 learners. The “word family classifier” is able to further push performance (F1 = 0.62 and φ = 0.45), highlighting the value of well-chosen linguistic features in developing word difficulty classifiers.

2024

pdf bib
LexComSpaL2: A Lexical Complexity Corpus for Spanish as a Foreign Language
Jasper Degraeuwe | Patrick Goethals
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We present LexComSpaL2, a novel corpus which can be employed to train personalised word-level difficulty classifiers for learners of Spanish as a foreign/second language (L2). The dataset contains 2,240 in-context target words with the corresponding difficulty judgements of 26 Dutch-speaking students who are learning Spanish as an L2, resulting in a total of 58,240 annotations. The target words are divided over 200 sentences from 4 different domains (economics, health, law, and migration) and have been selected based on their suitability to be included in L2 learning materials. As our annotation scheme, we use a customised version of the 5-point lexical complexity prediction scale (Shardlow et al., 2020), tailored to the vocabulary knowledge continuum (which ranges from no knowledge over receptive mastery to productive mastery; Schmitt, 2019). With LexComSpaL2, we aim to address the lack of relevant data for multi-category difficult prediction at word level for L2 learners of other languages than English.

pdf bib
Leading by Example: The Use of Generative Artificial Intelligence to Create Pedagogically Suitable Example Sentences
Jasper Degraeuwe | Patrick Goethals
Proceedings of the 13th Workshop on Natural Language Processing for Computer Assisted Language Learning

2022

pdf bib
Interactive word sense disambiguation in foreign language learning
Jasper Degraeuwe | Patrick Goethals
Proceedings of the 11th Workshop on NLP for Computer Assisted Language Learning

pdf bib
Lexical Simplification in Foreign Language Learning: Creating Pedagogically Suitable Simplified Example Sentences
Jasper Degraeuwe | Horacio Saggion
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

This study presents a lexical simplification (LS) methodology for foreign language (FL) learning purposes, a barely explored area of automatic text simplification (TS). The method, targeted at Spanish as a foreign language (SFL), includes a customised complex word identification (CWI) classifier and generates substitutions based on masked language modelling. Performance is calculated on a custom dataset by means of a new, pedagogically-oriented evaluation. With 43% of the top simplifications being found suitable, the method shows potential for simplifying sentences to be used in FL learning activities. The evaluation also suggests that, though still crucial, meaning preservation is not always a prerequisite for successful LS. To arrive at grammatically correct and more idiomatic simplifications, future research could study the integration of association measures based on co-occurrence data.