SeCoDa: Sense Complexity Dataset
David Strohmaier, Sian Gooding, Shiva Taslimipoor, Ekaterina Kochmar
Abstract
The Sense Complexity Dataset (SeCoDa) provides a corpus that is annotated jointly for complexity and word senses. It thus provides a valuable resource for both word sense disambiguation and the task of complex word identification. The intention is that this dataset will be used to identify complexity at the level of word senses rather than word tokens. For word sense annotation SeCoDa uses a hierarchical scheme that is based on information available in the Cambridge Advanced Learner’s Dictionary. This way we can offer more coarse-grained senses than directly available in WordNet.- Anthology ID:
- 2020.lrec-1.730
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 5962–5967
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.730
- DOI:
- Cite (ACL):
- David Strohmaier, Sian Gooding, Shiva Taslimipoor, and Ekaterina Kochmar. 2020. SeCoDa: Sense Complexity Dataset. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5962–5967, Marseille, France. European Language Resources Association.
- Cite (Informal):
- SeCoDa: Sense Complexity Dataset (Strohmaier et al., LREC 2020)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2020.lrec-1.730.pdf