Manoel Horta Ribeiro

2026

The Illusion of Insight in Reasoning Models
Liv G. d'Aliberti | Manoel Horta Ribeiro
Findings of the Association for Computational Linguistics: ACL 2026

Do reasoning models have "Aha!" moments?Prior work suggests that models like DeepSeek-R1-Zero undergo sudden mid-trace realizations that lead to accurate outputs, implying an intrinsic capacity for self-correction. Yet, it remains unclear whether such intrinsic shifts in reasoning strategy actually improve performance.Here, we study mid-reasoning shifts and instrument training runs to detect them. Our analysis spans 1M+ reasoning traces, hundreds of training checkpoints, three reasoning domains, and multiple decoding temperatures and model architectures.We find that reasoning shifts are rare, do not become more frequent with training, and seldom improve accuracy, indicating that they do not correspond to prior perceptions of model insight. However, their effect varies with model uncertainty. Building on this finding, we show that artificially triggering extrinsic shifts under high entropy reliably improves accuracy. Our results show that mid-reasoning shifts are symptoms of unstable inference behavior rather than an intrinsic mechanism for self-correction.

2024

pdf bib abs

Can Language Models Recognize Convincing Arguments?
Paula Rescala | Manoel Horta Ribeiro | Tiancheng Hu | Robert West
Findings of the Association for Computational Linguistics: EMNLP 2024

The capabilities of large language models (LLMs) have raised concerns about their potential to create and propagate convincing narratives. Here, we study their performance in detecting convincing arguments to gain insights into LLMs’ persuasive capabilities without directly engaging in experimentation with humans. We extend a dataset by Durmus and Cardie (2018) with debates, votes, and user traits and propose tasks measuring LLMs’ ability to (1) distinguish between strong and weak arguments, (2) predict stances based on beliefs and demographic characteristics, and (3) determine the appeal of an argument to an individual based on their traits. We show that LLMs perform on par with humans in these tasks and that combining predictions from different LLMs yields significant performance gains, surpassing human performance. The data and code released with this paper contribute to the crucial effort of continuously evaluating and monitoring LLMs’ capabilities and potential impact. (https://go.epfl.ch/persuasion-llm)

Co-authors

Venues

Findings2

Fix author