Few-Shot Table Understanding: A Benchmark Dataset and Pre-Training Baseline

Ruixue Liu; Shaozu Yuan; Aijun Dai; Lei Shen; Tiangang Zhu; Meng Chen; Xiaodong He

Few-Shot Table Understanding: A Benchmark Dataset and Pre-Training Baseline

Ruixue Liu, Shaozu Yuan, Aijun Dai, Lei Shen, Tiangang Zhu, Meng Chen, Xiaodong He

Abstract

Few-shot table understanding is a critical and challenging problem in real-world scenario as annotations over large amount of tables are usually costly. Pre-trained language models (PLMs), which have recently flourished on tabular data, have demonstrated their effectiveness for table understanding tasks. However, few-shot table understanding is rarely explored due to the deficiency of public table pre-training corpus and well-defined downstream benchmark tasks, especially in Chinese. In this paper, we establish a benchmark dataset, FewTUD, which consists of 5 different tasks with human annotations to systematically explore the few-shot table understanding in depth. Since there is no large number of public Chinese tables, we also collect a large-scale, multi-domain tabular corpus to facilitate future Chinese table pre-training, which includes one million tables and related natural language text with auxiliary supervised interaction signals. Finally, we present FewTPT, a novel table PLM with rich interactions over tabular data, and evaluate its performance comprehensively on the benchmark. Our dataset and model will be released to the public soon.

Anthology ID:: 2022.coling-1.329
Volume:: Proceedings of the 29th International Conference on Computational Linguistics
Month:: October
Year:: 2022
Address:: Gyeongju, Republic of Korea
Editors:: Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:: COLING
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 3741–3752
Language:
URL:: https://aclanthology.org/2022.coling-1.329
DOI:
Bibkey:
Cite (ACL):: Ruixue Liu, Shaozu Yuan, Aijun Dai, Lei Shen, Tiangang Zhu, Meng Chen, and Xiaodong He. 2022. Few-Shot Table Understanding: A Benchmark Dataset and Pre-Training Baseline. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3741–3752, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):: Few-Shot Table Understanding: A Benchmark Dataset and Pre-Training Baseline (Liu et al., COLING 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/emnlp-22-attachments/2022.coling-1.329.pdf

PDF Search