Abstract
Existing conversational question answering (CQA) datasets have been usually constructed from unstructured texts in English. In this paper, we propose Tab-CQA, a tabular CQA dataset created from Chinese financial reports that are extracted from listed companies in a wide range of different sectors in the past 30 years. From these reports, we select 2,463 tables, and manually generate 2,463 conversations with 35,494 QA pairs. Additionally, we select 4,578 tables, from which 4,578 conversations with 73,595 QA pairs are automatically created via a template-based method. With the manually- and automatically-generated conversations, Tab-CQA contains answerable and unanswerable questions. For the answerable questions, we further diversify them to cover a wide range of skills, e.g., table retrieval, fact checking, numerical reasoning, so as to accommodate real-world scenarios. We further propose two different tabular CQA models, a text-based model and an operation-based model, and evaluate them on Tab-CQA. Experiment results show that Tab-CQA is a very challenging dataset, where a huge performance gap exists between human and neural models. We will publicly release Tab-CQA as a benchmark testbed to promote further research on Chinese tabular CQA.- Anthology ID:
- 2023.acl-industry.20
- Volume:
- Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Sunayana Sitaram, Beata Beigman Klebanov, Jason D Williams
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 196–207
- Language:
- URL:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2023.acl-industry.20/
- DOI:
- 10.18653/v1/2023.acl-industry.20
- Cite (ACL):
- Chuang Liu, Junzhuo Li, and Deyi Xiong. 2023. Tab-CQA: A Tabular Conversational Question Answering Dataset on Financial Reports. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 196–207, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Tab-CQA: A Tabular Conversational Question Answering Dataset on Financial Reports (Liu et al., ACL 2023)
- PDF:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2023.acl-industry.20.pdf