TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data

Changjiang Jiang, Fengchang Yu, Haihua Chen, Wei Lu, Jin Zeng


Abstract
Complex reasoning over tabular data is crucial in real-world data analysis, yet large language models (LLMs) often underperform due to complex queries, noisy data, and limited numerical capabilities. To address these issues, we propose TabDSR, a three-agent framework consisting of: (1) a query decomposer that breaks down complex questions, (2) a table sanitizer that cleans and filters noisy tables, and (3) a program-of-thoughts (PoT)-based reasoner that generates executable code to derive the final answer from the sanitized table. To ensure unbiased evaluation and mitigate data leakage, we introduce a new dataset, CalTab151, specifically designed for complex numerical reasoning over tables. Experimental results demonstrate that TabDSR consistently outperforms existing methods, achieving state-of-the-art (SOTA) performance with 8.79%, 6.08%, and 19.87% accuracy improvement on TAT-QA, TableBench, and CalTab151, respectively. Moreover, our framework integrates seamlessly with mainstream LLMs, providing a robust solution for complex tabular numerical reasoning. These findings highlight the effectiveness of our framework in enhancing LLM performance for complex tabular numerical reasoning.
Anthology ID:
2025.findings-emnlp.169
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3172–3196
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.169/
DOI:
10.18653/v1/2025.findings-emnlp.169
Bibkey:
Cite (ACL):
Changjiang Jiang, Fengchang Yu, Haihua Chen, Wei Lu, and Jin Zeng. 2025. TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 3172–3196, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data (Jiang et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.169.pdf
Checklist:
 2025.findings-emnlp.169.checklist.pdf