This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
LucianaForti
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
This paper reports on a study aimed at comparing AI vs. human performance in detecting and categorising errors in L2 Italian texts. Four LLMs were considered: ChatGPT, Copilot, Gemini and Llama3. Two groups of human annotators were involved: L1 and L2 speakers of Italian. A gold standard set of annotations was developed. A fine-grained annotation scheme was adopted, to reflect the specific traits of Italian morphosyntax, with related potential learner errors. Overall, we found that human annotation outperforms AI, with some degree of variation with respect tospecific error types. An increased attention to languages other than English in NLP may significantly improve AI performance in this pivotal task for the many domains of language-related disciplines.
This paper presents a new resource for automatically assessing text difficulty in the context of Italian as a second or foreign language learning and teaching. It is called MALT-IT2, and it automatically classifies inputted texts according to the CEFR level they are more likely to belong to. After an introduction to the field of automatic text difficulty assessment, and an overview of previous related work, we describe the rationale of the project, the corpus and computational system it is based on. Experiments were conducted in order to investigate the reliability of the system. The results show that the system is able to obtain a good prediction accuracy, while a further analysis was conducted in order to identify the categories of features which mostly influenced the predictions.
The selection of texts for second language learning purposes typically relies on teachers’ and test developers’ individual judgment of the observable qualitative properties of a text. Little or no consideration is generally given to the quantitative dimension within an evidence-based framework of reproducibility. This study aims to fill the gap by evaluating the effectiveness of an automatic tool trained to assess text complexity in the context of Italian as a second language learning. A dataset of texts labeled by expert test developers was used to evaluate the performance of three classifier models (decision tree, random forest, and support vector machine), which were trained using linguistic features measured quantitatively and extracted from the texts. The experimental analysis provided satisfactory results, also in relation to which kind of linguistic trait contributed the most to the final outcome.