2023
pdf
bib
abs
Importance of Synthesizing High-quality Data for Text-to-SQL Parsing
Yiqun Hu
|
Yiyun Zhao
|
Jiarong Jiang
|
Wuwei Lan
|
Henghui Zhu
|
Anuj Chauhan
|
Alexander Hanbo Li
|
Lin Pan
|
Jun Wang
|
Chung-Wei Hang
|
Sheng Zhang
|
Jiang Guo
|
Mingwen Dong
|
Joseph Lilien
|
Patrick Ng
|
Zhiguo Wang
|
Vittorio Castelli
|
Bing Xiang
Findings of the Association for Computational Linguistics: ACL 2023
There has been increasing interest in synthesizing data to improve downstream text-to-SQL tasks. In this paper, we examined the existing synthesized datasets and discovered that state-of-the-art text-to-SQL algorithms did not further improve on popular benchmarks when trained with augmented synthetic data. We observed three shortcomings: illogical synthetic SQL queries from independent column sampling, arbitrary table joins, and language gaps between the synthesized SQL and natural language question (NLQ) pair. To address these issues, we propose a novel synthesis framework that imposes strong typing constraints, incorporates key relationships from schema, and conducts schema-distance-weighted column sampling. We also adopt an intermediate representation (IR) for the SQL-to-text task to further improve the quality of the generated NLQ. When existing powerful text-to-SQL parsers are pretrained on our high-quality synthesized data, these models have significant accuracy boosts and achieve new state-of-the-art performance on Spider. We also demonstrate the effectiveness of our techniques with ablation studies
pdf
bib
abs
Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge
Xingyu Fu
|
Sheng Zhang
|
Gukyeong Kwon
|
Pramuditha Perera
|
Henghui Zhu
|
Yuhao Zhang
|
Alexander Hanbo Li
|
William Yang Wang
|
Zhiguo Wang
|
Vittorio Castelli
|
Patrick Ng
|
Dan Roth
|
Bing Xiang
Findings of the Association for Computational Linguistics: ACL 2023
The open-ended Visual Question Answering (VQA) task requires AI models to jointly reason over visual and natural language inputs using world knowledge. Recently, pre-trained Language Models (PLM) such as GPT-3 have been applied to the task and shown to be powerful world knowledge sources. However, these methods suffer from low knowledge coverage caused by PLM bias – the tendency to generate certain tokens over other tokens regardless of prompt changes, and high dependency on the PLM quality – only models using GPT-3 can achieve the best result. To address the aforementioned challenges, we propose RASO: a new VQA pipeline that deploys a generate-then-select strategy guided by world knowledge for the first time. Rather than following the de facto standard to train a multi-modal model that directly generates the VQA answer, {pasted macro ‘MODEL’}name first adopts PLM to generate all the possible answers, and then trains a lightweight answer selection model for the correct answer. As proved in our analysis, RASO expands the knowledge coverage from in-domain training data by a large margin. We provide extensive experimentation and show the effectiveness of our pipeline by advancing the state-of-the-art by 4.1% on OK-VQA, without additional computation cost.
pdf
bib
abs
Benchmarking Diverse-Modal Entity Linking with Generative Models
Sijia Wang
|
Alexander Hanbo Li
|
Henghui Zhu
|
Sheng Zhang
|
Pramuditha Perera
|
Chung-Wei Hang
|
Jie Ma
|
William Yang Wang
|
Zhiguo Wang
|
Vittorio Castelli
|
Bing Xiang
|
Patrick Ng
Findings of the Association for Computational Linguistics: ACL 2023
Entities can be expressed in diverse formats, such as texts, images, or column names and cell values in tables. While existing entity linking (EL) models work well on per modality configuration, such as text-only EL, visual grounding or schema linking, it is more challenging to design a unified model for diverse modality configurations. To bring various modality configurations together, we constructed a benchmark for diverse-modal EL (DMEL) from existing EL datasets, covering all three modalities including text, image and table. To approach the DMEL task, we proposed a generative diverse-modal model (GDMM) following a multimodal-encoder-decoder paradigm. Pre-training GDMM with rich corpora builds a solid foundation for DMEL without storing the entire KB for inference. Fine-tuning GDMM builds a stronger DMEL baseline, outperforming state-of-the-art task-specific EL models by 8.51 F1 score on average. Additionally, extensive error analyses are conducted to highlight the challenge of DMEL, facilitating future researches on this task.