TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data
Changjiang Jiang, Fengchang Yu, Haihua Chen, Wei Lu, Jin Zeng
Abstract
Complex reasoning over tabular data is crucial in real-world data analysis, yet large language models (LLMs) often underperform due to complex queries, noisy data, and limited numerical capabilities. To address these issues, we propose TabDSR, a three-agent framework consisting of: (1) a query decomposer that breaks down complex questions, (2) a table sanitizer that cleans and filters noisy tables, and (3) a program-of-thoughts (PoT)-based reasoner that generates executable code to derive the final answer from the sanitized table. To ensure unbiased evaluation and mitigate data leakage, we introduce a new dataset, CalTab151, specifically designed for complex numerical reasoning over tables. Experimental results demonstrate that TabDSR consistently outperforms existing methods, achieving state-of-the-art (SOTA) performance with 8.79%, 6.08%, and 19.87% accuracy improvement on TAT-QA, TableBench, and CalTab151, respectively. Moreover, our framework integrates seamlessly with mainstream LLMs, providing a robust solution for complex tabular numerical reasoning. These findings highlight the effectiveness of our framework in enhancing LLM performance for complex tabular numerical reasoning.- Anthology ID:
- 2025.findings-emnlp.169
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2025
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3172–3196
- Language:
- URL:
- https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.169/
- DOI:
- 10.18653/v1/2025.findings-emnlp.169
- Cite (ACL):
- Changjiang Jiang, Fengchang Yu, Haihua Chen, Wei Lu, and Jin Zeng. 2025. TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 3172–3196, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data (Jiang et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.169.pdf