UNBNLP at SemEval-2021 Task 1: Predicting lexical complexity with masked language models and character-level encoders

Milton King, Ali Hakimi Parizi, Samin Fakharian, Paul Cook


Abstract
In this paper, we present three supervised systems for English lexical complexity prediction of single and multiword expressions for SemEval-2021 Task 1. We explore the use of statistical baseline features, masked language models, and character-level encoders to predict the complexity of a target token in context. Our best system combines information from these three sources. The results indicate that information from masked language models and character-level encoders can be combined to improve lexical complexity prediction.
Anthology ID:
2021.semeval-1.83
Volume:
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Month:
August
Year:
2021
Address:
Online
Venue:
SemEval
SIGs:
SIGLEX | SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
650–654
Language:
URL:
https://aclanthology.org/2021.semeval-1.83
DOI:
10.18653/v1/2021.semeval-1.83
Bibkey:
Cite (ACL):
Milton King, Ali Hakimi Parizi, Samin Fakharian, and Paul Cook. 2021. UNBNLP at SemEval-2021 Task 1: Predicting lexical complexity with masked language models and character-level encoders. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 650–654, Online. Association for Computational Linguistics.
Cite (Informal):
UNBNLP at SemEval-2021 Task 1: Predicting lexical complexity with masked language models and character-level encoders (King et al., SemEval 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.semeval-1.83.pdf