Program Synthesis for Complex QA on Charts via Probabilistic Grammar Based Filtered Iterative Back-Translation
Shabbirhussain Bhaisaheb, Shubham Paliwal, Rajaswa Patil, Manasi Patwardhan, Lovekesh Vig, Gautam Shroff
Abstract
Answering complex reasoning questions from chart images is a challenging problem requiring a combination of natural language understanding, fine-grained perception, and analytical reasoning. Current chart-based Question Answering (QA) approaches largely address structural, visual or simple data retrieval-type questions with fixed-vocabulary answers and perform poorly on reasoning queries. We focus on answering realistic, complex, reasoning-based questions where the answer needs to be computed and not selected from a fixed set of choices. Our approach employs a neural semantic parser to transform Natural Language (NL) questions into SQL programs and execute them on a standardized schema populated from the extracted chart contents. In the absence of program annotations, i.e., in a weak supervision setting, we obtain initial SQL predictions from a pre-trained CodeT5 semantic parser and employ Filtered Iterative Back-Translation (FIBT) for iteratively augmenting our NL-SQL training set. The forward (neural semantic parser) and backward (language model) models are initially trained with an external NL-SQL dataset. We iteratively move towards the NL query distribution by generating NL questions from the synthesized SQL programs using a Probabilistic Context-Free Grammar (PCFG) where the production rule probabilities are induced to be inversely proportional to the probabilities in the training data. We filter out the generated NL queries with mismatched structures and compositions. Our FIBT approach achieves State-of-the-Art (SOTA) results on reasoning-based queries in the PlotQA dataset yielding a test accuracy of 60.44%, superseding the previous baselines by a large margin.- Anthology ID:
- 2023.findings-eacl.189
- Volume:
- Findings of the Association for Computational Linguistics: EACL 2023
- Month:
- May
- Year:
- 2023
- Address:
- Dubrovnik, Croatia
- Editors:
- Andreas Vlachos, Isabelle Augenstein
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2501–2515
- Language:
- URL:
- https://aclanthology.org/2023.findings-eacl.189
- DOI:
- 10.18653/v1/2023.findings-eacl.189
- Cite (ACL):
- Shabbirhussain Bhaisaheb, Shubham Paliwal, Rajaswa Patil, Manasi Patwardhan, Lovekesh Vig, and Gautam Shroff. 2023. Program Synthesis for Complex QA on Charts via Probabilistic Grammar Based Filtered Iterative Back-Translation. In Findings of the Association for Computational Linguistics: EACL 2023, pages 2501–2515, Dubrovnik, Croatia. Association for Computational Linguistics.
- Cite (Informal):
- Program Synthesis for Complex QA on Charts via Probabilistic Grammar Based Filtered Iterative Back-Translation (Bhaisaheb et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2023.findings-eacl.189.pdf