Jiaxin Peng


2025

pdf bib
T2R-BENCH: A Benchmark for Real World Table-to-Report Task
Jie Zhang | Changzai Pan | Sishi Xiong | Kaiwen Wei | Yu Zhao | Xiangyu Li | Jiaxin Peng | Xiaoyan Gu | Jian Yang | Wenhan Chang | Zhenhe Wu | Jiang Zhong | Shuangyong Song | Xuelong Li
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Extensive research has been conducted to explore the capabilities of large language models (LLMs) in table reasoning. However, the essential task of transforming tables information into reports remains a significant challenge for industrial applications. This task is plagued by two critical issues: 1) the complexity and diversity of tables lead to suboptimal reasoning outcomes; and 2) existing table benchmarks lack the capacity to adequately assess the practical application of this task. To fill this gap, we propose the table-to-report task and construct a bilingual benchmark named T2R-bench, where the key information flow from the tables to the reports for this task. The benchmark comprises 457 industrial tables, all derived from real-world scenarios and encompassing 19 industry domains as well as four types of industrial tables. Furthermore, we propose a novel evaluation criteria to fairly measure the quality of report generation. Expeimental results show that Deepseek-R1 only achieves the best performance with 62.71% overall score, indicating that LLMs still have room for improvement on T2R-bench.