Abstract
Identifying complex words in texts is an important first step in text simplification (TS) systems. In this paper, we investigate the performance of binary comparative Lexical Complexity Prediction (LCP) models applied to a popular benchmark dataset — the CompLex 2.0 dataset used in SemEval-2021 Task 1. With the data from CompLex 2.0, we create a new dataset contain 1,940 sentences referred to as CompLex-BC. Using CompLex-BC, we train multiple models to differentiate which of two target words is more or less complex in the same sentence. A linear SVM model achieved the best performance in our experiments with an F1-score of 0.86.- Anthology ID:
- 2022.bea-1.24
- Volume:
- Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022)
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, Washington
- Editors:
- Ekaterina Kochmar, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Nitin Madnani, Anaïs Tack, Victoria Yaneva, Zheng Yuan, Torsten Zesch
- Venue:
- BEA
- SIG:
- SIGEDU
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 197–203
- Language:
- URL:
- https://aclanthology.org/2022.bea-1.24
- DOI:
- 10.18653/v1/2022.bea-1.24
- Cite (ACL):
- Kai North, Marcos Zampieri, and Matthew Shardlow. 2022. An Evaluation of Binary Comparative Lexical Complexity Models. In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022), pages 197–203, Seattle, Washington. Association for Computational Linguistics.
- Cite (Informal):
- An Evaluation of Binary Comparative Lexical Complexity Models (North et al., BEA 2022)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2022.bea-1.24.pdf