Structural Encoding and Pre-training Matter: Adapting BERT for Table-Based Fact Verification

Rui Dong, David Smith


Abstract
Growing concern with online misinformation has encouraged NLP research on fact verification. Since writers often base their assertions on structured data, we focus here on verifying textual statements given evidence in tables. Starting from the Table Parsing (TAPAS) model developed for question answering (Herzig et al., 2020), we find that modeling table structure improves a language model pre-trained on unstructured text. Pre-training language models on English Wikipedia table data further improves performance. Pre-training on a question answering task with column-level cell rank information achieves the best performance. With improved pre-training and cell embeddings, this approach outperforms the state-of-the-art Numerically-aware Graph Neural Network table fact verification model (GNN-TabFact), increasing statement classification accuracy from 72.2% to 73.9% even without modeling numerical information. Incorporating numerical information with cell rankings and pre-training on a question-answering task increases accuracy to 76%. We further analyze accuracy on statements implicating single rows or multiple rows and columns of tables, on different numerical reasoning subtasks, and on generalizing to detecting errors in statements derived from the ToTTo table-to-text generation dataset.
Anthology ID:
2021.eacl-main.201
Volume:
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Month:
April
Year:
2021
Address:
Online
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2366–2375
Language:
URL:
https://aclanthology.org/2021.eacl-main.201
DOI:
10.18653/v1/2021.eacl-main.201
Bibkey:
Cite (ACL):
Rui Dong and David Smith. 2021. Structural Encoding and Pre-training Matter: Adapting BERT for Table-Based Fact Verification. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2366–2375, Online. Association for Computational Linguistics.
Cite (Informal):
Structural Encoding and Pre-training Matter: Adapting BERT for Table-Based Fact Verification (Dong & Smith, EACL 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/paclic-22-ingestion/2021.eacl-main.201.pdf
Data
ToTTo