PingAnLifeInsurance at SemEval-2023 Task 12: Sentiment Analysis for Low-resource African Languages with Multi-Model Fusion
Meizhi Jin
Cheng Chen
Mengyuan Zhou
Mengfei Yuan
Xiaolong Hou
Xiyang Du
Lianxin Jiang
Jianyu Li
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This paper describes our system used in the SemEval-2023 Task12: Sentiment Analysis for Low-resource African Languages using Twit- ter Dataset (Muhammad et al., 2023c). The AfriSenti-SemEval Shared Task 12 is based on a collection of Twitter datasets in 14 African languages for sentiment classification. It con- sists of three sub-tasks. Task A is a monolin- gual sentiment classification which covered 12 African languages. Task B is a multilingual sen- timent classification which combined training data from Task A (12 African languages). Task C is a zero-shot sentiment classification. We uti- lized various strategies, including monolingual training, multilingual mixed training, and trans- lation technology, and proposed a weighted vot- ing method that combined the results of differ- ent strategies. Substantially, in the monolingual subtask, our system achieved Top-1 in two lan- guages (Yoruba and Twi) and Top-2 in four languages (Nigerian Pidgin, Algerian Arabic, and Swahili, Multilingual). In the multilingual subtask, Our system achived Top-2 in publish leaderBoard.
VarMAE: Pre-training of Variational Masked Autoencoder for Domain-adaptive Language Understanding
Dou Hu
Xiaolong Hou
Xiyang Du
Mengyuan Zhou
Lianxin Jiang
Yang Mo
Xiaofeng Shi
Findings of the Association for Computational Linguistics: EMNLP 2022
Pre-trained language models have been widely applied to standard benchmarks. Due to the flexibility of natural language, the available resources in a certain domain can be restricted to support obtaining precise representation. To address this issue, we propose a novel Transformer-based language model named VarMAE for domain-adaptive language understanding. Under the masked autoencoding objective, we design a context uncertainty learning module to encode the token’s context into a smooth latent distribution. The module can produce diverse and well-formed contextual representations. Experiments on science- and finance-domain NLU tasks demonstrate that VarMAE can be efficiently adapted to new domains with limited resources.
PALI-NLP at SemEval-2022 Task 4: Discriminative Fine-tuning of Transformers for Patronizing and Condescending Language Detection
Dou Hu
Zhou Mengyuan
Xiyang Du
Mengfei Yuan
Jin Zhi
Lianxin Jiang
Mo Yang
Xiaofeng Shi
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Patronizing and condescending language (PCL) has a large harmful impact and is difficult to detect, both for human judges and existing NLP systems. At SemEval-2022 Task 4, we propose a novel Transformer-based model and its ensembles to accurately understand such language context for PCL detection. To facilitate comprehension of the subtle and subjective nature of PCL, two fine-tuning strategies are applied to capture discriminative features from diverse linguistic behaviour and categorical distribution. The system achieves remarkable results on the official ranking, including 1st in Subtask 1 and 5th in Subtask 2. Extensive experiments on the task demonstrate the effectiveness of our system and its strategies.
PAIC at SemEval-2022 Task 5: Multi-Modal Misogynous Detection in MEMES with Multi-Task Learning And Multi-model Fusion
Jin Zhi
Zhou Mengyuan
Mengfei Yuan
Dou Hu
Xiyang Du
Lianxin Jiang
Yang Mo
XiaoFeng Shi
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
This paper describes our system used in the SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification (MAMI). Multimedia automatic misogyny recognition consists of the identification of misogynous memes, taking advantage of both text and images as sources of information. The task will be organized around two main subtasks: Task A is a binary classification task, which should be identified either as misogynous or not misogynous. Task B is a multi-label classification task, in which the types of misogyny should be identified in potential overlapping categories, such as stereotype, shaming, objectification, and violence. In this paper, we proposed a system based on multi-task learning for multi-modal misogynous detection in memes. Our system combined image features with text features to train a multi-label classification. The prediction results were obtained by the simple weighted average method of the results with different fusion models, and the results of Task A were corrected by Task B. Our system achieves a test accuracy of 0.755 on Task A (ranking 3rd on the final leaderboard) and the accuracy of 0.731 on Task B (ranking 1st on the final leaderboard).
PALI-NLP at SemEval-2022 Task 6: iSarcasmEval- Fine-tuning the Pre-trained Model for Detecting Intended Sarcasm
Xiyang Du
Dou Hu
Jin Zhi
Lianxin Jiang
Xiaofeng Shi
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
This paper describes the method we utilized in the SemEval-2022 Task 6 iSarcasmEval: Intended Sarcasm Detection In English and Arabic. Our system has achieved 1st in SubtaskB, which is to identify the categories of intended sarcasm. The proposed system integrates multiple BERT-based, RoBERTa-based and BERTweet-based models with finetuning. In this task, we contributed the following: 1) we reveal several large pre-trained models’ performance on tasks coping with the tweet-like text. 2) Our methods prove that we can still achieve excellent results in this particular task without a complex classifier adopting some proper training method. 3) we found there is a hierarchical relationship of sarcasm types in this task.
PALI at SemEval-2022 Task 7: Identifying Plausible Clarifications of Implicit and Underspecified Phrases in Instructional Texts
Zhou Mengyuan
Dou Hu
Mengfei Yuan
Jin Zhi
Xiyang Du
Lianxin Jiang
Yang Mo
Xiaofeng Shi
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
This paper describes our system used in the SemEval-2022 Task 7(Roth et al.): Identifying Plausible Clarifications of Implicit and Under-specified Phrases. Semeval Task7 is an more complex cloze task, different than normal cloze task, only requiring NLP system could find the best fillers for sentence. In Semeval Task7, NLP system not only need to choose the best fillers for each input instance, but also evaluate the quality of all possible fillers and give them a relative score according to context semantic information. We propose an ensemble of different state-of-the-art transformer-based language models(i.e., RoBERTa and Deberta) with some plug-and-play tricks, such as Grouped Layerwise Learning Rate Decay (GLLRD) strategy, contrastive learning loss, different pooling head and an external input data preprecess block before the information came into pretrained language models, which improve performance significantly. The main contributions of our sys-tem are 1) revealing the performance discrepancy of different transformer-based pretraining models on the downstream task; 2) presenting an efficient learning-rate and parameter attenuation strategy when fintuning pretrained language models; 3) adding different constrative learning loss to improve model performance; 4) showing the useful of the different pooling head structure. Our system achieves a test accuracy of 0.654 on subtask1(ranking 4th on the leaderboard) and a test Spearman’s rank correlation coefficient of 0.785 on subtask2(ranking 2nd on the leaderboard).