Abstract
This paper describes the system developed by the Laboratoire d’analyse statistique des textes (LAST) for the Lexical Complexity Prediction shared task at SemEval-2021. The proposed system is made up of a LightGBM model fed with features obtained from many word frequency lists, published lexical norms and psychometric data. For tackling the specificity of the multi-word task, it uses bigram association measures. Despite that the only contextual feature used was sentence length, the system achieved an honorable performance in the multi-word task, but poorer in the single word task. The bigram association measures were found useful, but to a limited extent.- Anthology ID:
- 2021.semeval-1.71
- Volume:
- Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 571–577
- Language:
- URL:
- https://aclanthology.org/2021.semeval-1.71
- DOI:
- 10.18653/v1/2021.semeval-1.71
- Cite (ACL):
- Yves Bestgen. 2021. LAST at SemEval-2021 Task 1: Improving Multi-Word Complexity Prediction Using Bigram Association Measures. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 571–577, Online. Association for Computational Linguistics.
- Cite (Informal):
- LAST at SemEval-2021 Task 1: Improving Multi-Word Complexity Prediction Using Bigram Association Measures (Bestgen, SemEval 2021)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/2021.semeval-1.71.pdf