An Evaluation of Binary Comparative Lexical Complexity Models

Kai North, Marcos Zampieri, Matthew Shardlow


Abstract
Identifying complex words in texts is an important first step in text simplification (TS) systems. In this paper, we investigate the performance of binary comparative Lexical Complexity Prediction (LCP) models applied to a popular benchmark dataset — the CompLex 2.0 dataset used in SemEval-2021 Task 1. With the data from CompLex 2.0, we create a new dataset contain 1,940 sentences referred to as CompLex-BC. Using CompLex-BC, we train multiple models to differentiate which of two target words is more or less complex in the same sentence. A linear SVM model achieved the best performance in our experiments with an F1-score of 0.86.
Anthology ID:
2022.bea-1.24
Volume:
Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022)
Month:
July
Year:
2022
Address:
Seattle, Washington
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
197–203
Language:
URL:
https://aclanthology.org/2022.bea-1.24
DOI:
10.18653/v1/2022.bea-1.24
Bibkey:
Cite (ACL):
Kai North, Marcos Zampieri, and Matthew Shardlow. 2022. An Evaluation of Binary Comparative Lexical Complexity Models. In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022), pages 197–203, Seattle, Washington. Association for Computational Linguistics.
Cite (Informal):
An Evaluation of Binary Comparative Lexical Complexity Models (North et al., BEA 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.bea-1.24.pdf