DEFT: A corpus for definition extraction in free- and semi-structured text

Sasha Spala, Nicholas A. Miller, Yiming Yang, Franck Dernoncourt, Carl Dockhorn


Abstract
Definition extraction has been a popular topic in NLP research for well more than a decade, but has been historically limited to well-defined, structured, and narrow conditions. In reality, natural language is messy, and messy data requires both complex solutions and data that reflects that reality. In this paper, we present a robust English corpus and annotation schema that allows us to explore the less straightforward examples of term-definition structures in free and semi-structured text.
Anthology ID:
W19-4015
Volume:
Proceedings of the 13th Linguistic Annotation Workshop
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Annemarie Friedrich, Deniz Zeyrek, Jet Hoek
Venue:
LAW
SIG:
SIGANN
Publisher:
Association for Computational Linguistics
Note:
Pages:
124–131
Language:
URL:
https://aclanthology.org/W19-4015
DOI:
10.18653/v1/W19-4015
Bibkey:
Cite (ACL):
Sasha Spala, Nicholas A. Miller, Yiming Yang, Franck Dernoncourt, and Carl Dockhorn. 2019. DEFT: A corpus for definition extraction in free- and semi-structured text. In Proceedings of the 13th Linguistic Annotation Workshop, pages 124–131, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
DEFT: A corpus for definition extraction in free- and semi-structured text (Spala et al., LAW 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/ml4al-ingestion/W19-4015.pdf
Data
DEFT Corpus