TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance

Fengbin Zhu; Wenqiang Lei; Youcheng Huang; Chao Wang; Shuo Zhang; Jiancheng Lv; Fuli Feng; Tat-Seng Chua

doi:10.18653/v1/2021.acl-long.254

TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance

Fengbin Zhu, Wenqiang Lei, Youcheng Huang, Chao Wang, Shuo Zhang, Jiancheng Lv, Fuli Feng, Tat-Seng Chua

Abstract

Hybrid data combining both tabular and textual content (e.g., financial reports) are quite pervasive in the real world. However, Question Answering (QA) over such hybrid data is largely neglected in existing research. In this work, we extract samples from real financial reports to build a new large-scale QA dataset containing both Tabular And Textual data, named TAT-QA, where numerical reasoning is usually required to infer the answer, such as addition, subtraction, multiplication, division, counting, comparison/sorting, and the compositions. We further propose a novel QA model termed TAGOP, which is capable of reasoning over both tables and text. It adopts sequence tagging to extract relevant cells from the table along with relevant spans from the text to infer their semantics, and then applies symbolic reasoning over them with a set of aggregation operators to arrive at the final answer. TAGOP achieves 58.0% inF1, which is an 11.1% absolute increase over the previous best baseline model, according to our experiments on TAT-QA. But this result still lags far behind performance of expert human, i.e.90.8% in F1. It is demonstrated that our TAT-QA is very challenging and can serve as a benchmark for training and testing powerful QA models that address hybrid form data.

Anthology ID:: 2021.acl-long.254
Volume:: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:: August
Year:: 2021
Address:: Online
Venues:: ACL | IJCNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3277–3287
Language:
URL:: https://aclanthology.org/2021.acl-long.254
DOI:: 10.18653/v1/2021.acl-long.254
Bibkey:
Cite (ACL):: Fengbin Zhu, Wenqiang Lei, Youcheng Huang, Chao Wang, Shuo Zhang, Jiancheng Lv, Fuli Feng, and Tat-Seng Chua. 2021. TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3277–3287, Online. Association for Computational Linguistics.
Cite (Informal):: TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance (Zhu et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-script-update/2021.acl-long.254.pdf
Video:: https://preview.aclanthology.org/ingestion-script-update/2021.acl-long.254.mp4
Code: NExTplusplus/TAT-QA
Data: TAT-QA

PDF Search Code Video