FORTAP: Using Formulas for Numerical-Reasoning-Aware Table Pretraining

Zhoujun Cheng, Haoyu Dong, Ran Jia, Pengfei Wu, Shi Han, Fan Cheng, Dongmei Zhang


Abstract
Tables store rich numerical data, but numerical reasoning over tables is still a challenge. In this paper, we find that the spreadsheet formula, a commonly used language to perform computations on numerical values in spreadsheets, is a valuable supervision for numerical reasoning in tables. Considering large amounts of spreadsheets available on the web, we propose FORTAP, the first exploration to leverage spreadsheet formulas for table pretraining. Two novel self-supervised pretraining objectives are derived from formulas, numerical reference prediction (NRP) and numerical calculation prediction (NCP). While our proposed objectives are generic for encoders, to better capture spreadsheet table layouts and structures, FORTAP is built upon TUTA, the first transformer-based method for spreadsheet table pretraining with tree attention. FORTAP outperforms state-of-the-art methods by large margins on three representative datasets of formula prediction, question answering, and cell type classification, showing the great potential of leveraging formulas for table pretraining.
Anthology ID:
2022.acl-long.82
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1150–1166
Language:
URL:
https://aclanthology.org/2022.acl-long.82
DOI:
10.18653/v1/2022.acl-long.82
Bibkey:
Cite (ACL):
Zhoujun Cheng, Haoyu Dong, Ran Jia, Pengfei Wu, Shi Han, Fan Cheng, and Dongmei Zhang. 2022. FORTAP: Using Formulas for Numerical-Reasoning-Aware Table Pretraining. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1150–1166, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
FORTAP: Using Formulas for Numerical-Reasoning-Aware Table Pretraining (Cheng et al., ACL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/2022.acl-long.82.pdf
Software:
 2022.acl-long.82.software.zip
Code
 microsoft/TUTA_table_understanding