Abstract
In this paper, we present three supervised systems for English lexical complexity prediction of single and multiword expressions for SemEval-2021 Task 1. We explore the use of statistical baseline features, masked language models, and character-level encoders to predict the complexity of a target token in context. Our best system combines information from these three sources. The results indicate that information from masked language models and character-level encoders can be combined to improve lexical complexity prediction.- Anthology ID:
- 2021.semeval-1.83
- Volume:
- Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Editors:
- Alexis Palmer, Nathan Schneider, Natalie Schluter, Guy Emerson, Aurelie Herbelot, Xiaodan Zhu
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 650–654
- Language:
- URL:
- https://aclanthology.org/2021.semeval-1.83
- DOI:
- 10.18653/v1/2021.semeval-1.83
- Cite (ACL):
- Milton King, Ali Hakimi Parizi, Samin Fakharian, and Paul Cook. 2021. UNBNLP at SemEval-2021 Task 1: Predicting lexical complexity with masked language models and character-level encoders. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 650–654, Online. Association for Computational Linguistics.
- Cite (Informal):
- UNBNLP at SemEval-2021 Task 1: Predicting lexical complexity with masked language models and character-level encoders (King et al., SemEval 2021)
- PDF:
- https://preview.aclanthology.org/bionlp-24-ingestion/2021.semeval-1.83.pdf