Massimo Zancanaro

2025

pdf bib abs
Exploring Paraphrasing Strategies for CEFR A1-Level Constraints in LLMs
Eugenio Marzona | Maria Goikhman | Alessio Palmero Aprosio | Massimo Zancanaro
Findings of the Association for Computational Linguistics: EMNLP 2025

Large language models are increasingly used for teaching and self-learning foreign languages. However, their capability to meet specific linguistic constraints is still underexplored. This study compares the effectiveness of prompt engineering in guiding ChatGPT (4o and 4o-mini), and Llama 3 to rephrase general-domain texts to meet CEFR A1-level constraints in English and Italian, making them suitable for beginner learners. It compares 4 prompt engineering approaches, built upon iterative paraphrasing method that gradually refines original texts for CEFR compliance. The approaches compared include paraphrasing with or without Chain-of-Thought, as well as grammar and vocabulary simplification performed either simultaneously or as separate steps. The findings suggest that for English the best approach is combining COT with separate grammar and vocabulary simplification, while for Italian one-step strategies have better effect on grammar, and two-step strategies work better for covering the vocabulary. The paraphrasing approach can approve compliance, although at this point it is not cost-effective. We release a dataset of pairs original sentence-beginner level paraphrase (both in Italian and in English) on which further work could be based.

2024

pdf bib abs
Annotation and Classification of Relevant Clauses in Terms-and-Conditions Contracts
Pietro Giovanni Bizzaro | Elena Della Valentina | Maurizio Napolitano | Nadia Mana | Massimo Zancanaro
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this paper, we propose a new annotation scheme to classify different types of clauses in Terms-and-Conditions contracts with the ultimate goal of supporting legal experts to quickly identify and assess problematic issues in this type of legal documents. To this end, we built a small corpus of Terms-and-Conditions contracts and finalized an annotation scheme of 14 categories, eventually reaching an inter-annotator agreement of 0.92. Then, for 11 of them, we experimented with binary classification tasks using few-shot prompting with a multilingual T5 and two fine-tuned versions of two BERT-based LLMs for Italian. Our experiments showed the feasibility of automatic classification of our categories by reaching accuracies ranging from .79 to .95 on validation tasks.