Mikołaj Langner

2026

Recent advances in large language models (LLMs) have introduced explicit reasoning capabilities, yet the factors that truly drive their improved performance remain unclear. In this work, we disentangle the effects of reasoning quality and sequence length by fine-tuning 8B models on several Polish variants of the Mixture-of-Thoughts (MoT-PL) dataset, each representing a distinct reasoning style: *Detailed*, *Summarized*, *BabyThink*, *Lengthy*. We found that the model trained on high-quality reasoning traces achieved better average performance than all other models; neither *longer reasoning with similar quality* nor *low-quality reasoning with similar length* achieved similar gains. Qualitative and quantitative analyses further reveal that reasoning clarity, rather than verbosity, is the dominant factor driving model performance. These findings underscore the importance of reasoning content quality in LLM training and provide new insights into designing more effective reasoning-oriented datasets and models.

Co-authors

Venues

Findings1

Fix author