Sophia Simeng Han
2026
Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
Yuan Sui | Yufei He | Tri Cao | Sophia Simeng Han | Yulin Chen | Bryan Hooi
Findings of the Association for Computational Linguistics: ACL 2026
Yuan Sui | Yufei He | Tri Cao | Sophia Simeng Han | Yulin Chen | Bryan Hooi
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) often struggle with computational efficiency and error propagation in multi-step reasoning tasks. While recent advancements on prompting and post-training have enabled LLMs to perform step-wise reasoning, they still tend to explore unproductive solution paths without effective backtracking or strategy adjustment. In this paper, we propose Meta-Reasoner, a new framework that empowers LLMs to “think about how to think”. It optimizes the inference process by dynamically adapting reasoning strategies in real-time. Our approach employs contextual multi-armed bandits (CMABs) to learn an adaptive policy. It learns to evaluate the current state of LLM’s reasoning and determine optimal strategy that is most likely to lead to a successful outcome during inference, like whether to backtrack, switch to a new approach, or restart the problem-solving process. This meta-guidance helps avoid unproductive paths exploration during inference and hence improves computational efficiency. We evaluate Meta-Reasoner on math problems (e.g., Game-of-24, TheoremQA) and scientific tasks (e.g., SciBench). Results show that our method outperform previous SOTA methods by 9-12% in accuracy, while reducing inference time by 28-35% under the same compute budget. Additional experiments on creative writing demonstrate the generalizability of our approach to diverse reasoning-intensive tasks.
Evaluating Legal Reasoning Traces with Legal Issue Tree Rubrics
Jinu Lee | Kyoung-Woon On | Sophia Simeng Han | Arman Cohan | Julia Hockenmaier
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jinu Lee | Kyoung-Woon On | Sophia Simeng Han | Arman Cohan | Julia Hockenmaier
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Evaluating the quality of LLM-generated reasoning traces in expert domains (e.g., law) is essential for ensuring credibility and explainability, yet remains challenging due to the inherent complexity of such reasoning tasks. We introduce LEGIT (LEGal Issue Trees), a novel large-scale (24K instances) expert-level legal reasoning dataset with an emphasis on reasoning trace evaluation. We convert court judgments into hierarchical trees of opposing parties’ arguments and the court’s conclusions, which serve as rubrics for evaluating the issue coverage and correctness of the reasoning traces. We verify the reliability of these rubrics via human expert annotations and comparison with coarse, less informative rubrics. Using the LEGIT dataset, we show that (1) LLMs’ legal reasoning ability is seriously affected by both legal issue coverage and correctness, and that (2) retrieval-augmented generation (RAG) and RL with rubrics bring complementary benefits for legal reasoning abilities, where RAG improves overall reasoning capability, whereas RL improves correctness albeit with reduced coverage.
2025
Proceedings of the 9th Widening NLP Workshop
Chen Zhang | Emily Allaway | Hua Shen | Lesly Miculicich | Yinqiao Li | Meryem M'hamdi | Peerat Limkonchotiwat | Richard He Bai | Santosh T.y.s.s. | Sophia Simeng Han | Surendrabikram Thapa | Wiem Ben Rim
Proceedings of the 9th Widening NLP Workshop
Chen Zhang | Emily Allaway | Hua Shen | Lesly Miculicich | Yinqiao Li | Meryem M'hamdi | Peerat Limkonchotiwat | Richard He Bai | Santosh T.y.s.s. | Sophia Simeng Han | Surendrabikram Thapa | Wiem Ben Rim
Proceedings of the 9th Widening NLP Workshop
CourtReasoner: Can LLM Agents Reason Like Judges?
Sophia Simeng Han | Yoshiki Takashima | Shannon Zejiang Shen | Chen Liu | Yixin Liu | Roque K. Thuo | Sonia Knowlton | Ruzica Piskac | Scott J Shapiro | Arman Cohan
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Sophia Simeng Han | Yoshiki Takashima | Shannon Zejiang Shen | Chen Liu | Yixin Liu | Roque K. Thuo | Sonia Knowlton | Ruzica Piskac | Scott J Shapiro | Arman Cohan
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
LLMs are increasingly applied in the legal domain in tasks such as summarizing legal texts and providing basic legal advice. Yet, their capacity to draft full judicial analyses in U.S. court opinions is still largely uncharted, such as generating entire judicial reasoning sections in U.S. court decisions, remain under-explored. Given the continued adoption of LLMs and the significance of law to society at large, measurement of LLM’s legal reasoning capabilities is a pressing task. We propose CourtReasoner, a novel expert-annotated judicial reasoning benchmark for evaluating LLM agents’ capabilities in complex legal reasoning. Sourcing U.S. court opinions, we construct benchmarks that measure the LLMs ability to construct goal-oriented legal reasoning. CourtReasoner measured the agent’s ability to argue both ways in a legal dispute, rather than simple Q/A. Our results show that more than 60% of frontier model outputs contain invalid arguments and more than 53% of frontier model produced irrelevant citations when conducting complex legal reasoning. We also introduce a meta-evaluation benchmark to provide insights into the capabilities of LLMs as evaluators of legal reasoning. We will release our data, code and full annotation guidelines publicly for future research.
Search
Fix author
Co-authors
- Arman Cohan 2
- Emily Allaway 1
- Richard He Bai 1
- Tri Cao 1
- Yulin Chen 1
- Yufei He 1
- Julia Hockenmaier 1
- Bryan Hooi 1
- Sonia Knowlton 1
- Jinu Lee 1
- Yinqiao Li 1
- Peerat Limkonchotiwat 1
- Chen Liu 1
- Yixin Liu 1
- Lesly Miculicich Werlen 1
- Meryem M’hamdi 1
- Kyoung-Woon On 1
- Ruzica Piskac 1
- Wiem Ben Rim 1
- Scott J Shapiro 1
- Hua Shen 1
- Shannon Zejiang Shen 1
- Yuan Sui 1
- Santosh T.Y.S.S. 1
- Yoshiki Takashima 1
- Surendrabikram Thapa 1
- Roque K. Thuo 1
- Chen Zhang 1