Abstract
This paper presents our strategy to address the SemEval-2022 Task 3 PreTENS: Presupposed Taxonomies Evaluating Neural Network Semantics. The goal of the task is to identify if a sentence is deemed acceptable or not, depending on the taxonomic relationship that holds between a noun pair contained in the sentence. For sub-task 1—binary classification—we propose an effective way to enhance the robustness and the generalizability of language models for better classification on this downstream task. We design a two-stage fine-tuning procedure on the ELECTRA language model using data augmentation techniques. Rigorous experiments are carried out using multi-task learning and data-enriched fine-tuning. Experimental results demonstrate that our proposed model, UU-Tax, is indeed able to generalize well for our downstream task. For sub-task 2 —regression—we propose a simple classifier that trains on features obtained from Universal Sentence Encoder (USE). In addition to describing the submitted systems, we discuss other experiments that employ pre-trained language models and data augmentation techniques. For both sub-tasks, we perform error analysis to further understand the behaviour of the proposed models. We achieved a global F1Binary score of 91.25% in sub-task 1 and a rho score of 0.221 in sub-task 2.- Anthology ID:
- 2022.semeval-1.35
- Volume:
- Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, United States
- Editors:
- Guy Emerson, Natalie Schluter, Gabriel Stanovsky, Ritesh Kumar, Alexis Palmer, Nathan Schneider, Siddharth Singh, Shyam Ratan
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 271–281
- Language:
- URL:
- https://aclanthology.org/2022.semeval-1.35
- DOI:
- 10.18653/v1/2022.semeval-1.35
- Cite (ACL):
- Injy Sarhan, Pablo Mosteiro, and Marco Spruit. 2022. UU-Tax at SemEval-2022 Task 3: Improving the generalizability of language models for taxonomy classification through data augmentation. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 271–281, Seattle, United States. Association for Computational Linguistics.
- Cite (Informal):
- UU-Tax at SemEval-2022 Task 3: Improving the generalizability of language models for taxonomy classification through data augmentation (Sarhan et al., SemEval 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2022.semeval-1.35.pdf
- Code
- is5882/uu-tax