Xiaobing Zhou

2024

pdf abs
Enhancing Semantics in Multimodal Chain of Thought via Soft Negative Sampling
Guangmin Zheng | Jin Wang | Xiaobing Zhou | Xuejie Zhang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Chain of thought (CoT) has proven useful for problems requiring complex reasoning. Many of these problems are both textual and multimodal. Given the inputs in different modalities, a model generates a rationale and then uses it to answer a question. Because of the hallucination issue, the generated soft negative rationales with high textual quality but illogical semantics do not always help improve answer accuracy. This study proposes a rationale generation method using soft negative sampling (SNSE-CoT) to mitigate hallucinations in multimodal CoT. Five methods were applied to generate soft negative samples that shared highly similar text but had different semantics from the original. Bidirectional margin loss (BML) was applied to introduce them into the traditional contrastive learning framework that involves only positive and negative samples. Extensive experiments on the ScienceQA dataset demonstrated the effectiveness of the proposed method. Code and data are released at https://github.com/zgMin/SNSE-CoT.

pdf abs
NLP_STR_teamS at SemEval-2024 Task1: Semantic Textual Relatedness based on MASK Prediction and BERT Model
Lianshuang Su | Xiaobing Zhou
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper describes our participation in the SemEval-2024 Task 1, “Semantic Textual Relatedness for African and Asian Languages.” This task detects the degree of semantic relatedness between pairs of sentences. Our approach is to take out the sentence pairs of each instance to construct a new sentence as the prompt template, use MASK to predict the correlation between the two sentences, use the BERT pre-training model to process and calculate the text sequence, and use the synonym replacement method in text data augmentation to expand the size of the data set. We participate in English in track A, which uses a supervised approach, and the Spearman Correlation on the test set is 0.809.

pdf abs
ignore at SemEval-2024 Task 5: A Legal Classification Model with Summary Generation and Contrastive Learning
Binjie Sun | Xiaobing Zhou
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper describes our work for SemEval-2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure. After analyzing the task requirements and the training dataset, we used data augmentation, adopted the large model GPT for summary generation, and added supervised contrastive learning to the basic BERT model. Our system achieved an F1 score of 0.551, ranking 14th in the competition leaderboard. Our system achieves an F1 score improvement of 0.1241 over the official baseline model.

2023

pdf abs
PCJ at SemEval-2023 Task 10: A Ensemble Model Based on Pre-trained Model for Sexism Detection and Classification in English
Chujun Pu | Xiaobing Zhou
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes the system and the resulting model submitted by our team “PCJ” to the SemEval-2023 Task 10 sub-task A contest. In this task, we need to test the English text content in the posts to determine whether there is sexism, which involves emotional text classification. Our submission system utilizes methods based on RoBERTa, SimCSE-RoBERTa pre-training models, and model ensemble to classify and train on datasets provided by the organizers. In the final assessment, our submission achieved a macro average F1 score of 0.8449, ranking 28th out of 84 teams in Task A.

pdf abs
YNUNLP at SemEval-2023 Task 2: The Pseudo Twin Tower Pre-training Model for Chinese Named Entity Recognition
Jing Li | Xiaobing Zhou
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper introduces our method in the system for SemEval 2023 Task 2: MultiCoNER II Multilingual Complex Named Entity Recognition, Track 9-Chinese. This task focuses on detecting fine-grained named entities whose data set has a fine-grained taxonomy of 36 NE classes, representing a realistic challenge for NER. In this task, we need to identify entity boundaries and category labels for the six identified categories. We use BERT embedding to represent each character in the original sentence and train CRF-Rdrop to predict named entity categories using the data set provided by the organizer. Our best submission, with a macro average F1 score of 0.5657, ranked 15th out of 22 teams.

pdf
YNU-HPCC at ROCLING 2023 MultiNER-Health Task: A transformer-based approach for Chinese healthcare NER
Chonglin Pang | You Zhang | Xiaobing Zhou
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)

2022

pdf abs
Sapphire at SemEval-2022 Task 4: A Patronizing and Condescending Language Detection Model Based on Capsule Networks
Sihui Li | Xiaobing Zhou
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper introduces the related work and the results of Team Sapphire’s system for SemEval-2022 Task 4: Patronizing and Condescending Language Detection. We only participated in subtask 1. The task goal is to judge whether a news text contains PCL. This task can be considered as a task of binary classification of news texts. In this binary classification task, the BERT-base model is adopted as the pre-trained model used to represent textual information in vector form and encode it. Capsule networks is adopted to extract features from the encoded vectors. The official evaluation metric for subtask 1 is the F1 score over the positive class. Finally, our system’s submitted prediction results on test set achieved the score of 0.5187.

2021

pdf abs
Grenzlinie at SemEval-2021 Task 7: Detecting and Rating Humor and Offense
Renyuan Liu | Xiaobing Zhou
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper introduces the result of Team Grenzlinie’s experiment in SemEval-2021 task 7: HaHackathon: Detecting and Rating Humor and Offense. This task has two subtasks. Subtask1 includes the humor detection task, the humor rating prediction task, and the humor controversy detection task. Subtask2 is an offensive rating prediction task. Detection task is a binary classification task, and the rating prediction task is a regression task between 0 to 5. 0 means the task is not humorous or not offensive, 5 means the task is very humorous or very offensive. For all the tasks, this paper chooses RoBERTa as the pre-trained model. In classification tasks, Bi-LSTM and adversarial training are adopted. In the regression task, the Bi-LSTM is also adopted. And then we propose a new approach named compare method. Finally, our system achieves an F1-score of 95.05% in the humor detection task, F1-score of 61.74% in the humor controversy detection task, 0.6143 RMSE in humor rating task, 0.4761 RMSE in the offensive rating task on the test datasets.

pdf abs
hub at SemEval-2021 Task 1: Fusion of Sentence and Word Frequency to Predict Lexical Complexity
Bo Huang | Yang Bai | Xiaobing Zhou
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

In this paper, we propose a method of fusing sentence information and word frequency information for the SemEval 2021 Task 1-Lexical Complexity Prediction (LCP) shared task. In our system, the sentence information comes from the RoBERTa model, and the word frequency information comes from the Tf-Idf algorithm. Use Inception block as a shared layer to learn sentence and word frequency information We described the implementation of our best system and discussed our methods and experiments in the task. The shared task is divided into two sub-tasks. The goal of the two sub-tasks is to predict the complexity of a predetermined word. The shared task is divided into two subtasks. The goal of the two subtasks is to predict the complexity of a predetermined word. The evaluation index of the task is the Pearson correlation coefficient. Our best performance system has Pearson correlation coefficients of 0.7434 and 0.8000 in the single-token subtask test set and the multi-token subtask test set, respectively.

pdf abs
hub at SemEval-2021 Task 2: Word Meaning Similarity Prediction Model Based on RoBERTa and Word Frequency
Bo Huang | Yang Bai | Xiaobing Zhou
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper introduces the system description of the hub team, which explains the related work and experimental results of our team’s participation in SemEval 2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC). The data of this shared task is mainly some cross-language or multi-language sentence pair corpus. The languages covered in the corpus include English, Chinese, French, Russian, and Arabic. The task goal is to judge whether the same words in these sentence pairs have the same meaning in the sentence. This can be seen as a task of binary classification of sentence pairs. What we need to do is to use our method to determine as accurately as possible the meaning of the words in a sentence pair are the same or different. The model used by our team is mainly composed of RoBERTa and Tf-Idf algorithms. The result evaluation index of task submission is the F1 score. We only participated in the English language task. The final score of the test set prediction results submitted by our team was 84.60.

pdf abs
hub at SemEval-2021 Task 5: Toxic Span Detection Based on Word-Level Classification
Bo Huang | Yang Bai | Xiaobing Zhou
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This article introduces the system description of the hub team, which explains the related work and experimental results of our team’s participation in SemEval 2021 Task 5: Toxic Spans Detection. The data for this shared task comes from some posts on the Internet. The task goal is to identify the toxic content contained in these text data. We need to find the span of the toxic text in the text data as accurately as possible. In the same post, the toxic text may be one paragraph or multiple paragraphs. Our team uses a classification scheme based on word-level to accomplish this task. The system we used to submit the results is ALBERT+BILSTM+CRF. The result evaluation index of the task submission is the F1 score, and the final score of the prediction result of the test set submitted by our team is 0.6640226029.

2020

pdf abs
DEEPYANG at SemEval-2020 Task 4: Using the Hidden Layer State of BERT Model for Differentiating Common Sense
Yang Bai | Xiaobing Zhou
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Introducing common sense to natural language understanding systems has received increasing research attention. To facilitate the researches on common sense reasoning, the SemEval-2020 Task 4 Commonsense Validation and Explanation(ComVE) is proposed. We participate in sub-task A and try various methods including traditional machine learning methods, deep learning methods, and also recent pre-trained language models. Finally, we concatenate the original output of BERT and the output vector of BERT hidden layer state to obtain more abundant semantic information features, and obtain competitive results. Our model achieves an accuracy of 0.8510 in the final test data and ranks 25th among all the teams.

pdf abs
BYteam at SemEval-2020 Task 5: Detecting Counterfactual Statements with BERT and Ensembles
Yang Bai | Xiaobing Zhou
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We participate in the classification tasks of SemEval-2020 Task: Subtask1: Detecting counterfactual statements of semeval-2020 task5(Detecting Counterfactuals). This paper examines different approaches and models towards detecting counterfactual statements classification. We choose the Bert model. However, the output of Bert is not a good summary of semantic information, so in order to obtain more abundant semantic information features, we modify the upper layer structure of Bert. Finally, our system achieves an accuracy of 88.90% and F1 score of 86.30% by hard voting, which ranks 6th on the final leader board of the in subtask 1 competition.

pdf abs
Zyy1510 Team at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text with Sub-word Level Representations
Yueying Zhu | Xiaobing Zhou | Hongling Li | Kunjie Dong
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper reports the zyy1510 team’s work in the International Workshop on Semantic Evaluation (SemEval-2020) shared task on Sentiment analysis for Code-Mixed (Hindi-English, English-Spanish) Social Media Text. The purpose of this task is to determine the polarity of the text, dividing it into one of the three labels positive, negative and neutral. To achieve this goal, we propose an ensemble model of word n-grams-based Multinomial Naive Bayes (MNB) and sub-word level representations in LSTM (Sub-word LSTM) to identify the sentiments of code-mixed data of Hindi-English and English-Spanish. This ensemble model combines the advantage of rich sequential patterns and the intermediate features after convolution from the LSTM model, and the polarity of keywords from the MNB model to obtain the final sentiment score. We have tested our system on Hindi-English and English-Spanish code-mixed social media data sets released for the task. Our model achieves the F1 score of 0.647 in the Hindi-English task and 0.682 in the English-Spanish task, respectively.

pdf abs
YNUtaoxin at SemEval-2020 Task 11: Identification Fragments of Propaganda Technique by Neural Sequence Labeling Models with Different Tagging Schemes and Pre-trained Language Model
Xin Tao | Xiaobing Zhou
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We only participated in the first subtask, and a neural sequence model was used to perform the sequence tagging task. We investigated the effects of different markup strategies on model performance. Bert that performed very well in NLP was used as a feature extractor.

pdf abs
Lee at SemEval-2020 Task 12: A BERT Model Based on the Maximum Self-ensemble Strategy for Identifying Offensive Language
Junyi Li | Xiaobing Zhou | Zichen Zhang
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This article describes the system submitted to SemEval 2020 Task 12: OffensEval 2020. This task aims to identify and classify offensive languages in different languages on social media. We only participate in the English part of subtask A, which aims to identify offensive languages in English. To solve this task, we propose a BERT model system based on the transform mechanism, and use the maximum self-ensemble to improve model performance. Our model achieved a macro F1 score of 0.913(ranked 13/82) in subtask A.

pdf abs
Automatic Detecting for Health-related Twitter Data with BioBERT
Yang Bai | Xiaobing Zhou
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

Social media used for health applications usually contains a large amount of data posted by users, which brings various challenges to NLP, such as spoken language, spelling errors, novel/creative phrases, etc. In this paper, we describe our system submitted to SMM4H 2020: Social Media Mining for Health Applications Shared Task which consists of five sub-tasks. We participate in subtask 1, subtask 2-English, and subtask 5. Our final submitted approach is an ensemble of various fine-tuned transformer-based models. We illustrate that these approaches perform well in imbalanced datasets (For example, the class ratio is 1:10 in subtask 2), but our model performance is not good in extremely imbalanced datasets (For example, the class ratio is 1:400 in subtask 1). Finally, in subtask 1, our result is lower than the average score, in subtask 2-English, our result is higher than the average score, and in subtask 5, our result achieves the highest score. The code is available online.

2019

pdf abs
YNU_DYX at SemEval-2019 Task 5: A Stacked BiGRU Model Based on Capsule Network in Detection of Hate
Yunxia Ding | Xiaobing Zhou | Xuejie Zhang
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes our system designed for SemEval 2019 Task 5 “Shared Task on Multilingual Detection of Hate”.We only participate in subtask-A in English. To address this task, we present a stacked BiGRU model based on a capsule network system. In or- der to convert the tweets into corresponding vector representations and input them into the neural network, we use the fastText tools to get word representations. Then, the sentence representation is enriched by stacked Bidirectional Gated Recurrent Units (BiGRUs) and used as the input of capsule network. Our system achieves an average F1-score of 0.546 and ranks 3rd in the subtask-A in English.

pdf abs
YNUWB at SemEval-2019 Task 6: K-max pooling CNN with average meta-embedding for identifying offensive language
Bin Wang | Xiaobing Zhou | Xuejie Zhang
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes the system submitted to SemEval 2019 Task 6: OffensEval 2019. The task aims to identify and categorize offensive language in social media, we only participate in Sub-task A, which aims to identify offensive language. In order to address this task, we propose a system based on a K-max pooling convolutional neural network model, and use an argument for averaging as a valid meta-embedding technique to get a metaembedding. Finally, we also use a cyclic learning rate policy to improve model performance. Our model achieves a Macro F1-score of 0.802 (ranked 9/103) in the Sub-task A.

pdf abs
YNU_DYX at SemEval-2019 Task 9: A Stacked BiLSTM for Suggestion Mining Classification
Yunxia Ding | Xiaobing Zhou | Xuejie Zhang
Proceedings of the 13th International Workshop on Semantic Evaluation

In this paper we describe a deep-learning system that competed as SemEval 2019 Task 9-SubTask A: Suggestion Mining from Online Reviews and Forums. We use Word2Vec to learn the distributed representations from sentences. This system is composed of a Stacked Bidirectional Long-Short Memory Network (SBiLSTM) for enriching word representations before and after the sequence relationship with context. We perform an ensemble to improve the effectiveness of our model. Our official submission results achieve an F1-score 0.5659.

pdf abs
YNU-junyi in BioNLP-OST 2019: Using CNN-LSTM Model with Embeddings for SeeDev Binary Event Extraction
Junyi Li | Xiaobing Zhou | Yuhang Wu | Bin Wang
Proceedings of the 5th Workshop on BioNLP Open Shared Tasks

We participated in the BioNLP 2019 Open Shared Tasks: binary relation extraction of SeeDev task. The model was constructed us- ing convolutional neural networks (CNN) and long short term memory networks (LSTM). The full text information and context information were collected using the advantages of CNN and LSTM. The model consisted of two main modules: distributed semantic representation construction, such as word embedding, distance embedding and entity type embed- ding; and CNN-LSTM model. The F1 value of our participated task on the test data set of all types was 0.342. We achieved the second highest in the task. The results showed that our proposed method performed effectively in the binary relation extraction.

2018

pdf abs
Yuan at SemEval-2018 Task 1: Tweets Emotion Intensity Prediction using Ensemble Recurrent Neural Network
Min Wang | Xiaobing Zhou
Proceedings of the 12th International Workshop on Semantic Evaluation

We perform the LSTM and BiLSTM model for the emotion intensity prediction. We only join the third subtask in Task 1:Affect in Tweets. Our system rank 6th among all the teams.

pdf abs
YNU_Deep at SemEval-2018 Task 11: An Ensemble of Attention-based BiLSTM Models for Machine Comprehension
Peng Ding | Xiaobing Zhou
Proceedings of the 12th International Workshop on Semantic Evaluation

We firstly use GloVe to learn the distributed representations automatically from the instance, question and answer triples. Then an attentionbased Bidirectional LSTM (BiLSTM) model is used to encode the triples. We also perform a simple ensemble method to improve the effectiveness of our model. The system we developed obtains an encouraging result on this task. It achieves the accuracy 0.7472 on the test set. We rank 5th according to the official ranking.

pdf abs
Lyb3b at SemEval-2018 Task 11: Machine Comprehension Task using Deep Learning Models
Yongbin Li | Xiaobing Zhou
Proceedings of the 12th International Workshop on Semantic Evaluation

Machine Comprehension of text is a typical Natural Language Processing task which remains an elusive challenge. This paper is to solve the task 11 of SemEval-2018, Machine Comprehension using Commonsense Knowledge task. We use deep learning model to solve the problem. We build distributed word embedding of text, question and answering respectively instead of manually extracting features by linguistic tools. Meanwhile, we use a series of frameworks such as CNN model, LSTM model, LSTM with attention model and biLSTM with attention model for processing word vector. Experiments demonstrate the superior performance of biLSTM with attention framework compared to other models. We also delete high frequency words and combine word vector and data augmentation methods, achieved a certain effect. The approach we proposed rank 6th in official results, with accuracy rate of 0.7437 in test dataset.

pdf abs
YNU Deep at SemEval-2018 Task 12: A BiLSTM Model with Neural Attention for Argument Reasoning Comprehension
Peng Ding | Xiaobing Zhou
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes the system submitted to SemEval-2018 Task 12 (The Argument Reasoning Comprehension Task). Enabling a computer to understand a text so that it can answer comprehension questions is still a challenging goal of NLP. We propose a Bidirectional LSTM (BiLSTM) model that reads two sentences separated by a delimiter to determine which warrant is correct. We extend this model with a neural attention mechanism that encourages the model to make reasoning over the given claims and reasons. Officially released results show that our system ranks 6th among 22 submissions to this task.

pdf abs
Lyb3b at SemEval-2018 Task 12: Ensemble-based Deep Learning Models for Argument Reasoning Comprehension Task
Yongbin Li | Xiaobing Zhou
Proceedings of the 12th International Workshop on Semantic Evaluation

Reasoning is a crucial part of natural language argumentation. In order to comprehend an argument, we have to reconstruct and analyze its reasoning. In this task, given a natural language argument with a reason and a claim, the goal is to choose the correct implicit reasoning from two options, in order to form a reasonable structure of (Reason, Warrant, Claim). Our approach is to build distributed word embedding of reason, warrant and claim respectively, meanwhile, we use a series of frameworks such as CNN model, LSTM model, GRU with attention model and biLSTM with attention model for processing word vector. Finally, ensemble mechanism is used to integrate the results of each framework to improve the final accuracy. Experiments demonstrate superior performance of ensemble mechanism compared to each separate framework. We are the 11th in official results, the final model can reach a 0.568 accuracy rate on the test dataset.

2017

pdf abs
YNUDLG at SemEval-2017 Task 4: A GRU-SVM Model for Sentiment Classification and Quantification in Twitter
Ming Wang | Biao Chu | Qingxun Liu | Xiaobing Zhou
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

Sentiment analysis is one of the central issues in Natural Language Processing and has become more and more important in many fields. Typical sentiment analysis classifies the sentiment of sentences into several discrete classes (e.g.,positive or negative). In this paper we describe our deep learning system(combining GRU and SVM) to solve both two-, three- and five-tweet polarity classifications. We first trained a gated recurrent neural network using pre-trained word embeddings, then we extracted features from GRU layer and input these features into support vector machine to fulfill both the classification and quantification subtasks. The proposed approach achieved 37th, 19th, and 14rd places in subtasks A, B and C, respectively.

pdf abs
YNUDLG at IJCNLP-2017 Task 5: A CNN-LSTM Model with Attention for Multi-choice Question Answering in Examinations
Min Wang | Qingxun Liu | Peng Ding | Yongbin Li | Xiaobing Zhou
Proceedings of the IJCNLP 2017, Shared Tasks

In this paper, we perform convolutional neural networks (CNN) to learn the joint representations of question-answer pairs first, then use the joint representations as the inputs of the long short-term memory (LSTM) with attention to learn the answer sequence of a question for labeling the matching quality of each answer. We also incorporating external knowledge by training Word2Vec on Flashcards data, thus we get more compact embedding. Experimental results show that our method achieves better or comparable performance compared with the baseline system. The proposed approach achieves the accuracy of 0.39, 0.42 in English valid set, test set, respectively.