Abstract
We describe the UTFPR systems submitted to the Lexical Complexity Prediction shared task of SemEval 2021. They perform complexity prediction by combining classic features, such as word frequency, n-gram frequency, word length, and number of senses, with BERT vectors. We test numerous feature combinations and machine learning models in our experiments and find that BERT vectors, even if not optimized for the task at hand, are a great complement to classic features. We also find that employing the principle of compositionality can potentially help in phrase complexity prediction. Our systems place 45th out of 55 for single words and 29th out of 38 for phrases.- Anthology ID:
- 2021.semeval-1.78
- Volume:
- Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Editors:
- Alexis Palmer, Nathan Schneider, Natalie Schluter, Guy Emerson, Aurelie Herbelot, Xiaodan Zhu
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 617–622
- Language:
- URL:
- https://aclanthology.org/2021.semeval-1.78
- DOI:
- 10.18653/v1/2021.semeval-1.78
- Cite (ACL):
- Gustavo Henrique Paetzold. 2021. UTFPR at SemEval-2021 Task 1: Complexity Prediction by Combining BERT Vectors and Classic Features. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 617–622, Online. Association for Computational Linguistics.
- Cite (Informal):
- UTFPR at SemEval-2021 Task 1: Complexity Prediction by Combining BERT Vectors and Classic Features (Paetzold, SemEval 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2021.semeval-1.78.pdf