A Relation Extraction Dataset for Knowledge Extraction from Web Tables

Siffi Singh, Alham Fikri Aji, Gaurav Singh, Christos Christodoulopoulos


Abstract
Relational web-tables are significant sources of structural information that are widely used for relation extraction and population of facts into knowledge graphs. To transform the web-table data into knowledge, we need to identify the relations that exist between column pairs. Currently, there are only a handful of publicly available datasets with relations annotated against natural web-tables. Most datasets are constructed using synthetic tables that lack valuable metadata information, or are limited in size to be considered as a challenging evaluation set. In this paper, we present REDTab, the largest natural-table relation extraction dataset. We have annotated ~9K tables and ~22K column pairs using crowd sourced annotators from MTurk, which has 50x larger number of column pairs than the existing human-annotated benchmark. Our test set is specially designed to be challenging as observed in our experiment results using TaBERT. We publicly release REDTab as a benchmark for the evaluation process in relation extraction.
Anthology ID:
2022.coling-1.203
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
2319–2327
Language:
URL:
https://aclanthology.org/2022.coling-1.203
DOI:
Bibkey:
Cite (ACL):
Siffi Singh, Alham Fikri Aji, Gaurav Singh, and Christos Christodoulopoulos. 2022. A Relation Extraction Dataset for Knowledge Extraction from Web Tables. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2319–2327, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
A Relation Extraction Dataset for Knowledge Extraction from Web Tables (Singh et al., COLING 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/2022.coling-1.203.pdf
Code
 alexa/alexa-dataset-redtab
Data
DBpediaT2Dv2