90% F1 Score in Relation Triple Extraction: Is it Real?

Pratik Saini; Samiran Pal; Tapas Nayak; Indrajit Bhattacharya

doi:10.18653/v1/2023.genbench-1.1

90% F1 Score in Relation Triple Extraction: Is it Real?

Pratik Saini, Samiran Pal, Tapas Nayak, Indrajit Bhattacharya

Abstract

Extracting relational triples from text is a crucial task for constructing knowledge bases. Recent advancements in joint entity and relation extraction models have demonstrated remarkable F1 scores (≥ 90%) in accurately extracting relational triples from free text. However, these models have been evaluated under restrictive experimental settings and unrealistic datasets. They overlook sentences with zero triples (zerocardinality), thereby simplifying the task. In this paper, we present a benchmark study of state-of-the-art joint entity and relation extraction models under a more realistic setting. We include sentences that lack any triples in our experiments, providing a comprehensive evaluation. Our findings reveal a significant decline (approximately 10-15% in one dataset and 6-14% in another dataset) in the models’ F1 scores within this realistic experimental setup. Furthermore, we propose a two-step modeling approach that utilizes a simple BERT-based classifier. This approach leads to overall performance improvement in these models within the realistic experimental setting.

Anthology ID:: 2023.genbench-1.1
Volume:: Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Dieuwke Hupkes, Verna Dankers, Khuyagbaatar Batsuren, Koustuv Sinha, Amirhossein Kazemnejad, Christos Christodoulopoulos, Ryan Cotterell, Elia Bruni
Venues:: GenBench | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–11
Language:
URL:: https://aclanthology.org/2023.genbench-1.1
DOI:: 10.18653/v1/2023.genbench-1.1
Bibkey:
Cite (ACL):: Pratik Saini, Samiran Pal, Tapas Nayak, and Indrajit Bhattacharya. 2023. 90% F1 Score in Relation Triple Extraction: Is it Real?. In Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP, pages 1–11, Singapore. Association for Computational Linguistics.
Cite (Informal):: 90% F1 Score in Relation Triple Extraction: Is it Real? (Saini et al., GenBench-WS 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/emnlp22-frontmatter/2023.genbench-1.1.pdf

PDF Cite Search