UU-Tax at SemEval-2022 Task 3: Improving the generalizability of language models for taxonomy classification through data augmentation

Injy Sarhan; Pablo Mosteiro; Marco Spruit

doi:10.18653/v1/2022.semeval-1.35

UU-Tax at SemEval-2022 Task 3: Improving the generalizability of language models for taxonomy classification through data augmentation

Injy Sarhan, Pablo Mosteiro, Marco Spruit

Abstract

This paper presents our strategy to address the SemEval-2022 Task 3 PreTENS: Presupposed Taxonomies Evaluating Neural Network Semantics. The goal of the task is to identify if a sentence is deemed acceptable or not, depending on the taxonomic relationship that holds between a noun pair contained in the sentence. For sub-task 1—binary classification—we propose an effective way to enhance the robustness and the generalizability of language models for better classification on this downstream task. We design a two-stage fine-tuning procedure on the ELECTRA language model using data augmentation techniques. Rigorous experiments are carried out using multi-task learning and data-enriched fine-tuning. Experimental results demonstrate that our proposed model, UU-Tax, is indeed able to generalize well for our downstream task. For sub-task 2 —regression—we propose a simple classifier that trains on features obtained from Universal Sentence Encoder (USE). In addition to describing the submitted systems, we discuss other experiments that employ pre-trained language models and data augmentation techniques. For both sub-tasks, we perform error analysis to further understand the behaviour of the proposed models. We achieved a global F1Binary score of 91.25% in sub-task 1 and a rho score of 0.221 in sub-task 2.

Anthology ID:: 2022.semeval-1.35
Volume:: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Month:: July
Year:: 2022
Address:: Seattle, United States
Editors:: Guy Emerson, Natalie Schluter, Gabriel Stanovsky, Ritesh Kumar, Alexis Palmer, Nathan Schneider, Siddharth Singh, Shyam Ratan
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 271–281
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.semeval-1.35/
DOI:: 10.18653/v1/2022.semeval-1.35
Bibkey:
Cite (ACL):: Injy Sarhan, Pablo Mosteiro, and Marco Spruit. 2022. UU-Tax at SemEval-2022 Task 3: Improving the generalizability of language models for taxonomy classification through data augmentation. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 271–281, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):: UU-Tax at SemEval-2022 Task 3: Improving the generalizability of language models for taxonomy classification through data augmentation (Sarhan et al., SemEval 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.semeval-1.35.pdf
Video:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.semeval-1.35.mp4
Code: is5882/uu-tax

PDF Cite Search Code Video Fix data