Semi-Automated Labeling of Requirement Datasets for Relation Extraction
Jeremias Bohn, Jannik Fischbach, Martin Schmitt, Hinrich Schütze, Andreas Vogelsang
Abstract
Creating datasets manually by human annotators is a laborious task that can lead to biased and inhomogeneous labels. We propose a flexible, semi-automatic framework for labeling data for relation extraction. Furthermore, we provide a dataset of preprocessed sentences from the requirements engineering domain, including a set of automatically created as well as hand-crafted labels. In our case study, we compare the human and automatic labels and show that there is a substantial overlap between both annotations.- Anthology ID:
- 2021.bucc-1.6
- Volume:
- Proceedings of the 14th Workshop on Building and Using Comparable Corpora (BUCC 2021)
- Month:
- September
- Year:
- 2021
- Address:
- Online (Virtual Mode)
- Editors:
- Reinhard Rapp, Serge Sharoff, Pierre Zweigenbaum
- Venue:
- BUCC
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 40–45
- Language:
- URL:
- https://aclanthology.org/2021.bucc-1.6
- DOI:
- Cite (ACL):
- Jeremias Bohn, Jannik Fischbach, Martin Schmitt, Hinrich Schütze, and Andreas Vogelsang. 2021. Semi-Automated Labeling of Requirement Datasets for Relation Extraction. In Proceedings of the 14th Workshop on Building and Using Comparable Corpora (BUCC 2021), pages 40–45, Online (Virtual Mode). INCOMA Ltd..
- Cite (Informal):
- Semi-Automated Labeling of Requirement Datasets for Relation Extraction (Bohn et al., BUCC 2021)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/2021.bucc-1.6.pdf