IITK@LCP at SemEval-2021 Task 1: Classification for Lexical Complexity Regression Task

Neil Shirude, Sagnik Mukherjee, Tushar Shandhilya, Ananta Mukherjee, Ashutosh Modi


Abstract
This paper describes our contribution to SemEval 2021 Task 1 (Shardlow et al., 2021): Lexical Complexity Prediction. In our approach, we leverage the ELECTRA model and attempt to mirror the data annotation scheme. Although the task is a regression task, we show that we can treat it as an aggregation of several classification and regression models. This somewhat counter-intuitive approach achieved an MAE score of 0.0654 for Sub-Task 1 and MAE of 0.0811 on Sub-Task 2. Additionally, we used the concept of weak supervision signals from Gloss-BERT in our work, and it significantly improved the MAE score in Sub-Task 1.
Anthology ID:
2021.semeval-1.66
Volume:
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Month:
August
Year:
2021
Address:
Online
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
541–547
Language:
URL:
https://aclanthology.org/2021.semeval-1.66
DOI:
10.18653/v1/2021.semeval-1.66
Bibkey:
Cite (ACL):
Neil Shirude, Sagnik Mukherjee, Tushar Shandhilya, Ananta Mukherjee, and Ashutosh Modi. 2021. IITK@LCP at SemEval-2021 Task 1: Classification for Lexical Complexity Regression Task. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 541–547, Online. Association for Computational Linguistics.
Cite (Informal):
IITK@LCP at SemEval-2021 Task 1: Classification for Lexical Complexity Regression Task (Shirude et al., SemEval 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/remove-xml-comments/2021.semeval-1.66.pdf