Iterative Document-level Information Extraction via Imitation Learning

Yunmo Chen, William Gantt, Weiwei Gu, Tongfei Chen, Aaron White, Benjamin Van Durme


Abstract
We present a novel iterative extraction model, IterX, for extracting complex relations, or templates, i.e., N-tuples representing a mapping from named slots to spans of text within a document. Documents may feature zero or more instances of a template of any given type, and the task of template extraction entails identifying the templates in a document and extracting each template’s slot values. Our imitation learning approach casts the problem as a Markov decision process (MDP), and relieves the need to use predefined template orders to train an extractor. It leads to state-of-the-art results on two established benchmarks – 4-ary relation extraction on SciREX and template extraction on MUC-4 – as well as a strong baseline on the new BETTER Granular task.
Anthology ID:
2023.eacl-main.136
Volume:
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1858–1874
Language:
URL:
https://aclanthology.org/2023.eacl-main.136
DOI:
10.18653/v1/2023.eacl-main.136
Award:
 EACL Outstanding Paper
Bibkey:
Cite (ACL):
Yunmo Chen, William Gantt, Weiwei Gu, Tongfei Chen, Aaron White, and Benjamin Van Durme. 2023. Iterative Document-level Information Extraction via Imitation Learning. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 1858–1874, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Iterative Document-level Information Extraction via Imitation Learning (Chen et al., EACL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/2023.eacl-main.136.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-1/2023.eacl-main.136.mp4