Few-Shot Table Understanding: A Benchmark Dataset and Pre-Training Baseline
Ruixue Liu, Shaozu Yuan, Aijun Dai, Lei Shen, Tiangang Zhu, Meng Chen, Xiaodong He
Abstract
Few-shot table understanding is a critical and challenging problem in real-world scenario as annotations over large amount of tables are usually costly. Pre-trained language models (PLMs), which have recently flourished on tabular data, have demonstrated their effectiveness for table understanding tasks. However, few-shot table understanding is rarely explored due to the deficiency of public table pre-training corpus and well-defined downstream benchmark tasks, especially in Chinese. In this paper, we establish a benchmark dataset, FewTUD, which consists of 5 different tasks with human annotations to systematically explore the few-shot table understanding in depth. Since there is no large number of public Chinese tables, we also collect a large-scale, multi-domain tabular corpus to facilitate future Chinese table pre-training, which includes one million tables and related natural language text with auxiliary supervised interaction signals. Finally, we present FewTPT, a novel table PLM with rich interactions over tabular data, and evaluate its performance comprehensively on the benchmark. Our dataset and model will be released to the public soon.- Anthology ID:
- 2022.coling-1.329
- Volume:
- Proceedings of the 29th International Conference on Computational Linguistics
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Editors:
- Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 3741–3752
- Language:
- URL:
- https://aclanthology.org/2022.coling-1.329
- DOI:
- Cite (ACL):
- Ruixue Liu, Shaozu Yuan, Aijun Dai, Lei Shen, Tiangang Zhu, Meng Chen, and Xiaodong He. 2022. Few-Shot Table Understanding: A Benchmark Dataset and Pre-Training Baseline. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3741–3752, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Cite (Informal):
- Few-Shot Table Understanding: A Benchmark Dataset and Pre-Training Baseline (Liu et al., COLING 2022)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/2022.coling-1.329.pdf