Stefania Spina


2024

pdf
Combining Grammatical and Relational Approaches. A Hybrid Method for the Identification of Candidate Collocations from Corpora
Damiano Perri | Irene Fioravanti | Osvaldo Gervasi | Stefania Spina
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024

We present an evaluation of three different methods for the automatic identification of candidate collocations in corpora, part of a research project focused on the development of a learner dictionary of Italian collocations. We compare the commonly used POS-based method and the syntactic dependency-based method with a hybrid method integrating both approaches. We conduct a statistical analysis on a sample corpus of written and spoken texts of different registers. Results show that the hybrid method can correctly detect more candidate collocations against a human annotated benchmark. The scores are particularly high in adjectival modifier rela- tions. A hybrid approach to candidate collocation identification seems to lead to an improvement in the quality of results.

2020

pdf
MALT-IT2: A New Resource to Measure Text Difficulty in Light of CEFR Levels for Italian L2 Learning
Luciana Forti | Giuliana Grego Bolli | Filippo Santarelli | Valentino Santucci | Stefania Spina
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper presents a new resource for automatically assessing text difficulty in the context of Italian as a second or foreign language learning and teaching. It is called MALT-IT2, and it automatically classifies inputted texts according to the CEFR level they are more likely to belong to. After an introduction to the field of automatic text difficulty assessment, and an overview of previous related work, we describe the rationale of the project, the corpus and computational system it is based on. Experiments were conducted in order to investigate the reliability of the system. The results show that the system is able to obtain a good prediction accuracy, while a further analysis was conducted in order to identify the categories of features which mostly influenced the predictions.

2019

pdf
Measuring Text Complexity for Italian as a Second Language Learning Purposes
Luciana Forti | Alfredo Milani | Luisa Piersanti | Filippo Santarelli | Valentino Santucci | Stefania Spina
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

The selection of texts for second language learning purposes typically relies on teachers’ and test developers’ individual judgment of the observable qualitative properties of a text. Little or no consideration is generally given to the quantitative dimension within an evidence-based framework of reproducibility. This study aims to fill the gap by evaluating the effectiveness of an automatic tool trained to assess text complexity in the context of Italian as a second language learning. A dataset of texts labeled by expert test developers was used to evaluate the performance of three classifier models (decision tree, random forest, and support vector machine), which were trained using linguistic features measured quantitatively and extracted from the texts. The experimental analysis provided satisfactory results, also in relation to which kind of linguistic trait contributed the most to the final outcome.

2010

pdf
The Dictionary of Italian Collocations: Design and Integration in an Online Learning Environment
Stefania Spina
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, I introduce the DICI, an electronic dictionary of Italian collocations designed to support the acquisition of the collocational competence in learners of Italian as a second or foreign language. I briefly describe the composition of the reference Italian corpus from which the collocations are extracted, and the methodology of extraction and filtering of candidate collocations. It is an experimental methodology, based on POS filtering, frequency and statistical measures, and tested on a 12-million-word sample from the reference corpus. Furthermore, I explain the main criteria for the composition of the dictionary, in addition to its integration with a Virtual Learning Environment (VLE), aimed at supporting learning activities on collocations. I briefly describe some of the main features of this integration with the VLE, such as the automatic recognition of collocations in written Italian texts, the possibility for students to obtain further linguistic information on selected collocations, and the automatic generation of tests for collocational competence assessment of language learners. While the main goal of the DICI is pedagogical, it is also intended to contribute to research in the field of collocations.