Active Learning for Interactive Relation Extraction in a French Newspaper’s Articles

Cyrielle Mallart, Michel Le Nouy, Guillaume Gravier, Pascale Sébillot


Abstract
Relation extraction is a subtask of natural langage processing that has seen many improvements in recent years, with the advent of complex pre-trained architectures. Many of these state-of-the-art approaches are tested against benchmarks with labelled sentences containing tagged entities, and require important pre-training and fine-tuning on task-specific data. However, in a real use-case scenario such as in a newspaper company mostly dedicated to local information, relations are of varied, highly specific type, with virtually no annotated data for such relations, and many entities co-occur in a sentence without being related. We question the use of supervised state-of-the-art models in such a context, where resources such as time, computing power and human annotators are limited. To adapt to these constraints, we experiment with an active-learning based relation extraction pipeline, consisting of a binary LSTM-based lightweight model for detecting the relations that do exist, and a state-of-the-art model for relation classification. We compare several choices for classification models in this scenario, from basic word embedding averaging, to graph neural networks and Bert-based ones, as well as several active learning acquisition strategies, in order to find the most cost-efficient yet accurate approach in our French largest daily newspaper company’s use case.
Anthology ID:
2021.ranlp-1.101
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
886–894
Language:
URL:
https://aclanthology.org/2021.ranlp-1.101
DOI:
Bibkey:
Cite (ACL):
Cyrielle Mallart, Michel Le Nouy, Guillaume Gravier, and Pascale Sébillot. 2021. Active Learning for Interactive Relation Extraction in a French Newspaper’s Articles. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 886–894, Held Online. INCOMA Ltd..
Cite (Informal):
Active Learning for Interactive Relation Extraction in a French Newspaper’s Articles (Mallart et al., RANLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2021.ranlp-1.101.pdf