Lu Wang

Other people with similar names: Lu Wang, Lu Wang, Lu Wang, Lu Wang

Unverified author pages with similar names: Lu Wang

2026

Skill-Aware Data Selection and Fine-Tuning for Data-Efficient Reasoning Distillation
Lechen Zhang | Yunxiang Zhang | Wei Hu | Lu Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Large reasoning models such as DeepSeek-R1 and their distilled variants achieve strong performance on complex reasoning tasks. Yet, distilling these models often demands large-scale data for supervised fine-tuning (SFT), motivating the pursuit of data-efficient training methods. To address this, we propose a skill-centric distillation framework that efficiently transfers reasoning ability to weaker models with two components: (1) Skill-based data selection, which prioritizes examples targeting the student model’s weaker skills, and (2) Skill-aware fine-tuning, which encourages explicit skill decomposition during problem solving. With only 1,000 training examples selected from a 100K teacher-generated corpus, our method surpasses random SFT baselines by +1.6% on Qwen3-4B and +1.4% on Qwen3-8B across five mathematical reasoning benchmarks. Further analysis confirms that these gains concentrate on skills emphasized during training, highlighting the effectiveness of skill-centric training for efficient reasoning distillation.

pdf bib abs

Do LLMs Really Need 10+ Thoughts for “Find the Time 1000 Days Later”? Towards Structural Understanding of LLM Overthinking
Xinliang Frederick Zhang | Anhad Mohananey | Alexandra Chronopoulou | Pinelopi Papalampidi | Somit Gupta | Tsendsuren Munkhdalai | Lu Wang | Shyam Upadhyay
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Models employing long chain-of-thought (CoT) reasoning have shown superior performance on complex reasoning tasks. Yet, this capability introduces a critical and often overlooked inefficiency—overthinking—models often engage in unnecessarily extensive reasoning even for simple queries, incurring significant computations without accuracy improvements. While prior work has explored solutions to mitigate overthinking, a fundamental gap remains in our understanding of its underlying causes. Most existing analyses are limited to superficial, profiling-based observations, failing to delve into LLMs’ inner workings. This study introduces a systematic, fine-grained analyzer of LLMs’ thought process to bridge the gap, TRACE. We first benchmark the overthinking issue, confirming that long-thinking models are five to twenty times slower on simple tasks with no substantial gains. We then use TRACE to first decompose the thought process into minimally complete sub-thoughts. Next, by inferring discourse relationships among sub-thoughts, we construct granular thought progression graphs and subsequently identify common thinking patterns for topically similar queries. Our analysis reveals two major patterns for open-weight thinking models—Explorer and Late Landing. This finding provides evidence that over-verification and over-exploration are the primary drivers of overthinking in LLMs. Grounded in thought structures, we propose a utility-based definition of overthinking, which moves beyond length-based metrics. This revised definition offers a more insightful understanding of LLMs’ thought progression, as well as practical guidelines for principled overthinking management.

pdf bib abs

CASPER in the Machine: Insights into Character Variety in LLM-Generated Stories
Anneliese Brei | Abhisheik Sharma | Nicholas Sanaie | Lu Wang | Snigdha Chaturvedi
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

As LLM-generated text is increasingly used, especially in fictional domains, we explore how much LLM-generated stories differ from human-written stories. In this work, we focus on characters. We borrow definitions from narratology to analyze 8 intricate category-pairs of character, such as stylization and wholeness. These category-pairs consider more than just basic characteristics. They assess how characters are portrayed within their stories. After automatically inferring categories of characters within both LLM and human-written stories, we compare and contrast these two sets of stories. We consider the following overarching questions: (1) Do LLMs and human-written stories have similar characters? and (2) Do LLMs generate stories with a variety of characters? Our analysis includes research questions that focus on stories generated by popular LLMs and recently published human-written stories. We describe a number of interesting similarities, differences and key takeaways.

Co-authors

Anhad Mohananey 1

Tsendsuren Munkhdalai 1

Pinelopi Papalampidi 1

Xinliang Frederick Zhang 1

Yunxiang Zhang 1

Venues

ACL3

Fix author