Xiaoman Zhao


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2022

pdf bib
PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-training
Zihui Gu | Ju Fan | Nan Tang | Preslav Nakov | Xiaoman Zhao | Xiaoyong Du
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Fact verification has attracted a lot of attention recently, e.g., in journalism, marketing, and policymaking, as misinformation and dis- information can sway one’s opinion and affect one’s actions. While fact-checking is a hard task in general, in many cases, false statements can be easily debunked based on analytics over tables with reliable information. Hence, table- based fact verification has recently emerged as an important and growing research area. Yet, progress has been limited due to the lack of datasets that can be used to pre-train language models (LMs) to be aware of common table operations, such as aggregating a column or comparing tuples. To bridge this gap, this paper introduces PASTA for table-based fact verification via pre-training with synthesized sentence–table cloze questions. In particular, we design six types of common sentence–table cloze tasks, including Filter, Aggregation, Superlative, Comparative, Ordinal, and Unique, based on which we synthesize a large corpus consisting of 1.2 million sentence–table pairs from WikiTables. PASTA uses a recent pre-trained LM, DeBERTaV3, and further pre- trains it on our corpus. Our experimental results show that PASTA achieves new state-of-the-art (SOTA) performance on two table-based fact verification datasets TabFact and SEM-TAB- FACTS. In particular, on the complex set of TabFact, which contains multiple operations, PASTA largely outperforms previous SOTA by 4.7% (85.6% vs. 80.9%), and the gap between PASTA and human performance on the small test set is narrowed to just 1.5% (90.6% vs. 92.1%).