Semi-Automated Labeling of Requirement Datasets for Relation Extraction

Jeremias Bohn, Jannik Fischbach, Martin Schmitt, Hinrich Schütze, Andreas Vogelsang


Abstract
Creating datasets manually by human annotators is a laborious task that can lead to biased and inhomogeneous labels. We propose a flexible, semi-automatic framework for labeling data for relation extraction. Furthermore, we provide a dataset of preprocessed sentences from the requirements engineering domain, including a set of automatically created as well as hand-crafted labels. In our case study, we compare the human and automatic labels and show that there is a substantial overlap between both annotations.
Anthology ID:
2021.bucc-1.6
Volume:
Proceedings of the 14th Workshop on Building and Using Comparable Corpora (BUCC 2021)
Month:
September
Year:
2021
Address:
Online (Virtual Mode)
Editors:
Reinhard Rapp, Serge Sharoff, Pierre Zweigenbaum
Venue:
BUCC
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
40–45
Language:
URL:
https://aclanthology.org/2021.bucc-1.6
DOI:
Bibkey:
Cite (ACL):
Jeremias Bohn, Jannik Fischbach, Martin Schmitt, Hinrich Schütze, and Andreas Vogelsang. 2021. Semi-Automated Labeling of Requirement Datasets for Relation Extraction. In Proceedings of the 14th Workshop on Building and Using Comparable Corpora (BUCC 2021), pages 40–45, Online (Virtual Mode). INCOMA Ltd..
Cite (Informal):
Semi-Automated Labeling of Requirement Datasets for Relation Extraction (Bohn et al., BUCC 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-bitext-workshop/2021.bucc-1.6.pdf