Few-Shot Table Understanding: A Benchmark Dataset and Pre-Training Baseline

Ruixue Liu, Shaozu Yuan, Aijun Dai, Lei Shen, Tiangang Zhu, Meng Chen, Xiaodong He


Abstract
Few-shot table understanding is a critical and challenging problem in real-world scenario as annotations over large amount of tables are usually costly. Pre-trained language models (PLMs), which have recently flourished on tabular data, have demonstrated their effectiveness for table understanding tasks. However, few-shot table understanding is rarely explored due to the deficiency of public table pre-training corpus and well-defined downstream benchmark tasks, especially in Chinese. In this paper, we establish a benchmark dataset, FewTUD, which consists of 5 different tasks with human annotations to systematically explore the few-shot table understanding in depth. Since there is no large number of public Chinese tables, we also collect a large-scale, multi-domain tabular corpus to facilitate future Chinese table pre-training, which includes one million tables and related natural language text with auxiliary supervised interaction signals. Finally, we present FewTPT, a novel table PLM with rich interactions over tabular data, and evaluate its performance comprehensively on the benchmark. Our dataset and model will be released to the public soon.
Anthology ID:
2022.coling-1.329
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3741–3752
Language:
URL:
https://aclanthology.org/2022.coling-1.329
DOI:
Bibkey:
Cite (ACL):
Ruixue Liu, Shaozu Yuan, Aijun Dai, Lei Shen, Tiangang Zhu, Meng Chen, and Xiaodong He. 2022. Few-Shot Table Understanding: A Benchmark Dataset and Pre-Training Baseline. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3741–3752, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Few-Shot Table Understanding: A Benchmark Dataset and Pre-Training Baseline (Liu et al., COLING 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2022.coling-1.329.pdf