Zhongqiu Li
2025
UCS-SQL: Uniting Content and Structure for Enhanced Semantic Bridging In Text-to-SQL
Zhenhe Wu
|
Zhongqiu Li
|
JieZhangChinaTele JieZhangChinaTele
|
Zhongjiang He
|
Jian Yang
|
Yu Zhao
|
Ruiyu Fang
|
Bing Wang
|
Hongyan Xie
|
Shuangyong Song
|
Zhoujun Li
Findings of the Association for Computational Linguistics: ACL 2025
With the rapid advancement of large language models (LLMs), recent researchers have increasingly focused on the superior capabilities of LLMs in text/code understanding and generation to tackle text-to-SQL tasks. Traditional approaches adopt schema linking to first eliminate redundant tables and columns and prompt LLMs for SQL generation. However, they often struggle with accurately identifying corresponding tables and columns, due to discrepancies in naming conventions between natural language questions (NL) and database schemas. Besides, existing methods overlook the challenge of effectively transforming structure information from NL into SQL. To address these limitations, we introduce UCS-SQL, a novel text-to-SQL framework, uniting both content and structure pipes to bridge the gap between NL and SQL. Specifically, the content pipe focuses on identifying key content within the original content, while the structure pipe is dedicated to transforming the linguistic structure from NL to SQL. Additionally, we strategically selects few-shot examples by considering both the SQL Skeleton and Question Expression (SS-QE selection method), thus providing targeted examples for SQL generation. Experimental results on BIRD and Spider demonstrate the effectiveness of our UCS-SQL framework.
2022
Unregulated Chinese-to-English Data Expansion Does NOT Work for Neural Event Detection
Zhongqiu Li
|
Yu Hong
|
Jie Wang
|
Shiming He
|
Jianmin Yao
|
Guodong Zhou
Proceedings of the 29th International Conference on Computational Linguistics
We leverage cross-language data expansion and retraining to enhance neural Event Detection (abbr., ED) on English ACE corpus. Machine translation is utilized for expanding English training set of ED from that of Chinese. However, experimental results illustrate that such strategy actually results in performance degradation. The survey of translations suggests that the mistakenly-aligned triggers in the expanded data negatively influences the retraining process. We refer this phenomenon to “trigger falsification”. To overcome the issue, we apply heuristic rules for regulating the expanded data, fixing the distracting samples that contain the falsified triggers. The supplementary experiments show that the rule-based regulation is beneficial, yielding the improvement of about 1.6% F1-score for ED. We additionally prove that, instead of transfer learning from the translated ED data, the straight data combination by random pouring surprisingly performs better.