Kun Hu
2026
CORES: Code-Oriented Reasoning for Complex Text-to-SQL and Generalizable TableQA
Meng Zhang | Ruochun Jin | Yuanxi Peng | Wenjing Yang | Haotian Wang | Liting Sun | Kun Hu | Silin Yang | Zhang Ke-di
Findings of the Association for Computational Linguistics: ACL 2026
Meng Zhang | Ruochun Jin | Yuanxi Peng | Wenjing Yang | Haotian Wang | Liting Sun | Kun Hu | Silin Yang | Zhang Ke-di
Findings of the Association for Computational Linguistics: ACL 2026
Text-to-SQL aims to bridge the gap between human intent and relational databases. While LLMs have shown proficiency in generating simple SQL queries, they struggle with complex analytical tasks. Moreover, models fine-tuned on SQL generation often suffer from catastrophic forgetting, which lose the versatility of procedural reasoning and pertaining to generation constraints. Inspired by the usage of high-resource programming languages as LLM reasoning intermediaries, we propose CORES model, which leverages Python as a procedural reasoning pivot to enhance both complex SQL generation and tabular reasoning. It decomposes complex queries into Python reasoning traces before generating the final SQL, which bridges the gap between procedural reasoning and declarative expression. In order to internalize this reasoning capability, we fine-tune LLMs via GRPO with tailored process reward functions that mitigate the sparse feedback problem. We experimentally verify the effectiveness of CORES on six text-to-SQL benchmarks, where ours outperforms baselines by 6.44% on average, while maintains good capability on three tableQA benchmarks.
2022
OTExtSum: Extractive Text Summarisation with Optimal Transport
Peggy Tang | Kun Hu | Rui Yan | Lei Zhang | Junbin Gao | Zhiyong Wang
Findings of the Association for Computational Linguistics: NAACL 2022
Peggy Tang | Kun Hu | Rui Yan | Lei Zhang | Junbin Gao | Zhiyong Wang
Findings of the Association for Computational Linguistics: NAACL 2022
Extractive text summarisation aims to select salient sentences from a document to form a short yet informative summary. While learning-based methods have achieved promising results, they have several limitations, such as dependence on expensive training and lack of interpretability. Therefore, in this paper, we propose a novel non-learning-based method by for the first time formulating text summarisation as an Optimal Transport (OT) problem, namely Optimal Transport Extractive Summariser (OTExtSum). Optimal sentence extraction is conceptualised as obtaining an optimal summary that minimises the transportation cost to a given document regarding their semantic distributions. Such a cost is defined by the Wasserstein distance and used to measure the summary’s semantic coverage of the original document. Comprehensive experiments on four challenging and widely used datasets - MultiNews, PubMed, BillSum, and CNN/DM demonstrate that our proposed method outperforms the state-of-the-art non-learning-based methods and several recent learning-based methods in terms of the ROUGE metric.