Giulia Pucci

2025

pdf bib abs
R2-MultiOmnia: Leading Multilingual Multimodal Reasoning via Self-Training
Leonardo Ranaldi | Federico Ranaldi | Giulia Pucci
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Reasoning is an intricate process that transcends both language and vision; yet, despite its inherently modality-agnostic nature, develop-ing effective multilingual and multimodal reasoning capabilities remains a substantial challenge for Multimodal Large Language Models (MLLMs). They struggle to activate complex reasoning behaviours, delivering step-wise explanation, questioning and reflection, particularly in multilingual settings where high-quality supervision across languages is lacking. Recent works have introduced eclectic strategies to enhance MLLMs’ reasoning; however, they remain related to a single language.To make MLLMs’ reasoning capabilities aligned among languages and improve modality performances, we propose R2-MultiOmnia, a modular approach that instructs the models to abstract key elements of the reasoning process and then refine reasoning trajectories via self-correction. Specifically, we instruct the models producing multimodal synthetic resources by bridging modalities and then self-improving their capabilities. To stabilise learning and the reasoning processes structure, we propose Curriculum Learning Reasoning Stabilisation with structured output rewards to gradually refine the models’ capabilities to learn and deliver robust reasoning processes. Experiments show that R2-MultiOmnia improves multimodal reasoning, gets aligned performances among the languages approaching strong models.

pdf bib abs
Advancing Oversight Reasoning across Languages for Audit Sycophantic Behaviour via X-Agent
Giulia Pucci | Leonardo Ranaldi
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) have demonstrated capabilities that are highly satisfactory to a wide range of users by adapting to their culture and wisdom. Yet, this can translate into a propensity to produce responses that align with users’ viewpoints, even when the latter are wrong. This behaviour is known as sycophancy, the tendency of LLMs to generate misleading responses as long as they align with the user’s, inducing bias and reducing reliability. To make interactions consistent, reliable and safe, we introduce X-Agent, an Oversight Reasoning framework that audits human–LLM dialogues, reasons about them, captures sycophancy and corrects the final outputs. Concretely, X-Agent extends debate-based frameworks by (i) auditing human-LLM conversations, (ii) applying a defence layer that steers model behaviour and goes beyond user beliefs, and (iii) extracting reasoning traces from evaluations that serve as training signals for mitigating sycophancy. We evaluate X-Agent across diverse scenarios and languages, showing that it consistently detects sycophancy, reduces unwarranted agreement, and improves cross-turn consistency, advancing a reasoning-as-overview paradigm for safer LLM interaction. Our approach introduces a novel paradigm in which reasoning is not merely a means to solve problems, but as a mechanism for overseeing the problem-solving processes of other models.

pdf bib abs
Exploring Backward Reasoning in Large Language Models
Leonardo Ranaldi | Giulia Pucci
Findings of the Association for Computational Linguistics: NAACL 2025

Multi-step reasoning through in-context learning strategies have been extensively explored, highlighting the abilities of Large Language Models (LLMs) to generate answers derived from step-by-step reasoning. These studies focus the attention on LLMs’ forward reasoning abilities epitomised in a series of general premises leading to a final solution. In this paper, by taking the reverse perspective, we study the backward reasoning abilities of LLMs, namely the inference that leads to the causal hypothesis. Behind formalising the backward problems, we analyse whether the LLMs are able to reason about the conclusion and reconstruct the original question that led to the delivery of the final answer. Operating with question-answering tasks involving symbolic reasoning, understanding, and commonsense abilities, we observe that the proposed models reveal robust comprehension capabilities managing different kinds of input; however, they are not always able to reason in the backward direction. Finally, to challenge this limitation, we demonstrate that instructing LLMs to generate the answer by reconsidering the structure of the problem allows for improved backward reasoning direction.

pdf bib abs
Multilingual Reasoning via Self-training
Leonardo Ranaldi | Giulia Pucci
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Although reasoning is innately language-agnostic, the multilingual capacities remains a significant challenge for large language models (LLMs). Their ability to generate structured, step-wise explanations is constantly restricted to dominant languages in pre-training data, making cross-lingual generalisation difficult and hindering broader global adoption. Recent works have introduced eclectic strategies to improve reasoning beyond English; however, these methods remain related to specific language that is not always optimal for reasoning.To improve LLMs’ multilingual reasoning abilities, we propose a modular approach that instructs the models to structure reasoning passages in a different problem space and then self-refine their capabilities to deliver step-wise reasoning passages that lead to the solution. Experiments show that our approach stably achieves significant improvements in the multilingual reasoning of various models and task, with improved reasoning consistency across languages.

2024

pdf bib abs
The limits of Italian in Reasoning Tasks
Leonardo Ranaldi | Giulia Pucci | Federico Ranaldi | Elena Sofia Ruzzetti | Fabio Massimo Zanzotto
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Previous studies have demonstrated the effectiveness of reasoning methods in eliciting multi-step reasoned answers from Large Language Models (LLMs) by leveraging in-context demonstrations. These methods, exemplified by Chain-of-Thought (CoT) and Program-Aided Language Models (PAL), have been shown to reason well in monolingual contexts, primarily in English. There has, however, been limited exploration of their abilities in other languages, especially in Italian.To gain a deeper understanding of the role of reasoning methods in in-context demonstrations, we propose a multidimensional analysis tailored to Italian, focusing on arithmetic and symbolic reasoning tasks. Our findings indicate that the effectiveness of reasoning methods varies significantly beyond English. Specifically, CoT, which relies on natural language demonstrations, is limited to English. Conversely, the structured nature of PAL in-context demonstrations facilitates multilingual comprehension, enabling LLMs to generate programmatic answers in Italian as well. Finally, for a more comprehensive overview, we observe that additional alignment methods do not improve downstream performances; in contrast, in some cases, they limit the abilities of the original models. This leads to significant improvements in the accuracy and quality of the generated responses.

pdf bib abs
How Far Does the Sequence of Compositions Impact Multilingual Pre-Training?
Leonardo Ranaldi | Giulia Pucci | Fabio Massimo Zanzotto
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

The most efficient strategy for conducting pre-training of language models is the concatenation of contiguous sequences of text of fixed length through causal masking that estimates the probability of each token given its context.However, the role of the composition sequence pre-training technique in the models’ generalization properties has yet to be explored.In this paper, we show that operating via causal masking impacts model performance because it could include misleading information from previous text sequences during pre-training.To fill this gap, we propose intra-context causal masking where the probability of each token is conditional only on the previous in the same chunk of text, avoiding misleading information from different contexts.Hence, we demonstrate that organizing text chunks based on a policy that aligns with text similarity effectively reduces the risk of misleading context during pre-training by enhancing language models’ in-context learning and factual knowledge storage capabilities while maintaining efficiency.

pdf bib abs
Empowering Multi-step Reasoning across Languages via Program-Aided Language Models
Leonardo Ranaldi | Giulia Pucci | Barry Haddow | Alexandra Birch
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

In-context learning methods are popular inference strategies where Large Language Models (LLMs) are elicited to solve a task using provided demonstrations without parameter updates. Among these approaches are the reasoning methods, best exemplified by Chain-of-Thought (CoT) and Program-Aided Language Models (PAL), which elicit LLMs to generate reasoning paths, thus promoting accuracy and attracting increasing attention. However, despite the success of these methods, the ability to deliver multi-step reasoning remains limited to a single language, making it challenging to generalize to other languages and hindering global development.In this work, we propose Cross-lingual Program-Aided Language Models (CrossPAL), a method for aligning reasoning programs across languages. In particular, our method delivers programs as intermediate reasoning steps in different languages through a double-step cross-lingual prompting mechanism inspired by the Program-Aided approach. In addition, we introduce Self-consistent CrossPAL (SCrossPAL) to ensemble different reasoning paths across languages. Our experimental evaluations show that our method significantly outperforms existing prompting methods, reducing the number of interactions and achieving state-of-the-art performance.

pdf bib abs
A Tree-of-Thoughts to Broaden Multi-step Reasoning across Languages
Leonardo Ranaldi | Giulia Pucci | Federico Ranaldi | Elena Sofia Ruzzetti | Fabio Massimo Zanzotto
Findings of the Association for Computational Linguistics: NAACL 2024

Reasoning methods, best exemplified by the well-known Chain-of-Thought (CoT), empower the reasoning abilities of Large Language Models (LLMs) by eliciting them to solve complex tasks in a step-by-step manner. Although they are achieving significant success, the ability to deliver multi-step reasoning remains limited to English because of the imbalance in the distribution of pre-training data, which makes other languages a barrier. In this paper, we propose Cross-lingual Tree-of-Thoughts (Cross-ToT), a method for aligning Cross-lingual CoT reasoning across languages. The proposed method, through a self-consistent cross-lingual prompting mechanism inspired by the Tree-of-Thoughts approach, provides multi-step reasoning paths in different languages that, during the steps, lead to the final solution. Experimental evaluations show that our method significantly outperforms existing prompting methods by reducing the number of interactions and achieving state-of-the-art performance.

pdf bib abs
Empowering cross-lingual abilities of instruction-tuned large language models by translation-following demonstrations
Leonardo Ranaldi | Giulia Pucci | Andre Freitas
Findings of the Association for Computational Linguistics: ACL 2024

The language ability of Large Language Models (LLMs) is often unbalanced towards English because of the imbalance in the distribution of the pre-training data. This disparity is demanded in further fine-tuning and affecting the cross-lingual abilities of LLMs. In this paper, we propose to empower Instruction-tuned LLMs (It-LLMs) in languages other than English by building semantic alignment between them. Hence, we propose CrossAlpaca, an It-LLM with cross-lingual Instruction-following and Translation-following demonstrations to improve semantic alignment between languages. We validate our approach on the multilingual Question Answering (QA) benchmarks XQUAD and MLQA and adapted versions of MMLU and BBH.Our models, tested over six different languages, outperform the It-LLMs tuned on monolingual data. The final results show that instruction tuning on non-English data is not enough and that semantic alignment can be further improved by Translation-following demonstrations.

pdf bib abs
Does the Order Matter? Curriculum Learning over Languages
Leonardo Ranaldi | Giulia Pucci | Andrè Freitas
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Curriculum Learning (CL) has been emerged as an effective technique for improving the performances and reducing the cost of pre-training Large Language Models (LLMs). The efficacy of CL demonstrated in different scenarios is in the training LLMs by organizing examples from the simplest to the most complex. Although improvements have been shown extensively, this approach was used for pre-training, leaving novel fine-tuning approaches such as instruction-tuning unexplored. In this paper, we propose a novel complexity measure to empower the instruction-tuning method using the CL paradigm. To complement previous works, we propose cognitively motivated measures to determine the complexity of training demonstrations used in the instruction-tuning paradigm. Hence, we experiment with the proposed heuristics first in English and then in other languages. The downstream results show that delivering training examples by complexity ranking is also effective for instruction tuning, as it improves downstream performance while reducing costs. Furthermore, the technique can be easily transferred to languages other than English, e.g., Italian and French, without any adaptation, maintaining functionality and effectiveness.

2023

pdf bib
Are All Languages Equal? Curriculum Learning over Different Languages
Giulia Pucci | Leonardo Ranaldi | Fabio Massimo Zanzotto
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

pdf bib
Teasing LLMs Adapted to Italian
Leonardo Ranaldi | Giulia Pucci | Elena Sofia Ruzzetti | Fabio Massimo Zanzotto | André Freitas
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

pdf bib
Does the English Matter? Elicit Cross-lingual Abilities of Large Language Models
Leonardo Ranaldi | Giulia Pucci
Proceedings of the 3rd Workshop on Multi-lingual Representation Learning (MRL)

pdf bib abs
Modeling Easiness for Training Transformers with Curriculum Learning
Leonardo Ranaldi | Giulia Pucci | Fabio Massimo Zanzotto
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Directly learning from complex examples is generally problematic for humans and machines. Indeed, a better strategy is exposing learners to examples in a reasonable, pedagogically-motivated order. Curriculum Learning (CL) has been proposed to import this strategy when training machine learning models. In this paper, building on Curriculum Learning, we propose a novel, linguistically motivated measure to determine example complexity for organizing examples during learning. Our complexity measure - LRC- is based on length, rarity, and comprehensibility. Our resulting learning model is CL-LRC, that is, CL with LRC. Experiments on downstream tasks show that CL-LRC outperforms existing CL and non-CL methods for training BERT and RoBERTa from scratch. Furthermore, we analyzed different measures, including perplexity, loss, and learning curve of different models pre-trained from scratch, showing that CL-LRC performs better than the state-of-the-art.

Co-authors

Alexandra Birch 1

Barry Haddow 1

Venues

mrl1

ws1