Kongqiang Wang
Also published as: Wang Kongqiang
2026
wangkongqiang at SemEval-2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures
Wang Kongqiang | Zhang Peng | Tan Qingli
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Wang Kongqiang | Zhang Peng | Tan Qingli
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper presents our system developed for the SemEval-2026 Task 7: Everyday KnowledgeAcross Diverse Languages and Cultures. on Subtask 1: Short Answer Questions (SAQ). on Subtask 2: Multiple-Choice Questions (MCQ). To this end, we focus on models’ cultural competence across 26 languages and 30 countries using four different versions large language models (LLMs): deepseek-v3.2-exp, qwen-max, qwen-plus, and qwen3-next-80ba3b-instruct. We experiment with 1) the trialand test dataset is analyzed visually, 2) use the large language generative model to perform generate or select the answer that it deems correct on the trial and test dataset through prompts, and 3) many prompt engineering approaches of generative models are evaluated on the trial dataset. We further study the influence of different hyperparameters on the generative model and select the best single model for the prediction of the test dataset. Our submission achieved the good ranking place in the test dataset leaderboard. For Subtask 1 (SAQ), the evaluation criteria for this task mainly consistof the aggregate results of the 23 languages: ar-EG, ar-MA, ar-SA, bg-BG, el-GR, en-AU, and so on, and they are measured using the accuracy score. For Subtask 2 (MCQ), this task is essentially a multiple-choice task for questions text. Performance will be evaluated using accuracy score. In other words, this subtask evaluated using accuracy score based on the correctness of the selected answer across different languages and cultural contexts. For Subtask 1 (SAQ) and Subtask 2 (MCQ), our best approach is to obtain the results in test dataset are accuracy score 51.4689 and accuracy score 80.26 separately. For the final ranking, organizers will use the aggregate results of accuracy score. Even so,our approach has yielded good results.
wangkongqiang at SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization
Wang Kongqiang | Tan Qingli
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Wang Kongqiang | Tan Qingli
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper presents our system developed for the SemEval-2026 Task 9: Detecting Multilingual,Multicultural and Multievent Online Polarization. on Subtask 1: Multilingual Text Classification Challenge - Polarization Detection. on Subtask 2: Multilingual Text Classification Challenge - Polarization Type Classification. on Subtask 3: Multilingual Text Classification Challenge - Manifestation Identification. To this end, we focus on English and Spanish language use two different pre-trained languages models: models–google-bert–bertbase-uncased, and models–microsoft–debertav3-base. We experiment with 1) the training set data is analyzed visually, 2) use the gemma-3-27b-it generative model to perform data augmentation on the training dataset through prompts, and 3) multiple numbers of single models are trained on the training set data. We further study the influence of different hyperparameters on the single model and select the best single model for the prediction of the test set. Our submission achieved the good ranking place in the test set. All subtasks evaluated using Macro F1 score across different languages and cultural contexts. For Subtask 1, the English and Spanish language tasks are Macro F1 Score 0.7805 and 0.7155 respectively. For Subtask 2, the English and Spanish language tasks are Macro F1 Score 0.2603 and 0.4647 respectively. For Subtask 3, the English and Spanish language tasks are Macro F1 Score 0.2766 and 0.3322 respectively. For the final ranking, organizers will use the Macro F1 score. Even so, my approach has yielded good results from an overall perspective.
wangkongqiang at SemEval-2026 Task 10: PsyCoMark- Psycholinguistic Conspiracy Marker Extraction and Detection
Wang Kongqiang | Tan Qingli
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Wang Kongqiang | Tan Qingli
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper presents our system developed for the SemEval-2026 Task 10: PsyCoMark Psycholinguistic Conspiracy Marker Extraction and Detection. on Subtask 1: Conspiracy Marker Extraction. on Subtask 2: Conspiracy Detection. To this end, we focus on English language use four different pre-trained languages models: models–distilbert–distilbert-base uncased, models–distilbert–distilbert-base-multilingual-cased, models–lxyuan–distilbert-base-multilingual-cased-sentiments-student, and models–microsoft–deberta-v3-base. We experiment with 1) the training set data is analyzed visually, 2) use the gemma-3-27b-it generative model to perform data augmentation on the training dataset through prompts for Subtask 2: Conspiracy Detection, and 3) multiple numbers of single models are trained on the training set data. We further study the influence of different hyperparameters on the single model and select the best single model for the prediction of the test set. Our submission achieved the good ranking place in the test set leaderboard. For Subtask 1, the evaluation criteria for this task mainly consist of the aggregate results of the four markers: Actor, Action, Effect, and Victim, and they are measured using the Macro F1 score. For Subtask 2, this task is essentially a binary classification task for text. Performance will be evaluated using macro-averaged F1 score. In other words, this subtask evaluated using Weighted F1 score across different sentences and cultural contexts. For Subtask 1 and Subtask 2, our best approach is to obtain the results are Macro F1 score 0.1587 and Weighted F1 score 0.7411 separately. For the final ranking, organizers will use the aggregate results of Macro F1 score and Weighted F1 score. Even so, our approach has yielded good results.
wangkongqiang at SemEval-2026 Task 1: MWAHAHA- Competition on Humor Generation
Wang Kongqiang | Zhang Peng | Tan Qingli
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Wang Kongqiang | Zhang Peng | Tan Qingli
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper presents our system developed for the SemEval-2026 Task 1: MWAHAHA-Competition on Humor Generation. on Subtask A: Text-based Humor Generation. Given a set of text-based constraints, generate a joke. This subtask A will be conducted in English, Spanish, and Chinese. on Subtask B: Image-Based Caption Generation. This subtask explores humor in a multimodal context, combining visual inputs with text generation. This subtask B is in English only. To this end, we mainly focus on Subtask A: Text-based Humor Generation in English and Chinese, Subtask B: Image-BasedCaption Generation in English language to use two important languages models: BLIP and Qwen series LLM. For Task B1: Image-only Humor Generation and Task B2: Image and Prompt Humor Generation. Our submission achieved the good ranking place in the test set. All subtasks evaluated using Rating (95% CI) score across different languages and modality contexts. For Subtask A in English and Chinese, Rating score 950 and 1054, 95% CI [ 922, 982] and [1024, 1104], ranked 16th and 1st respectively. For Subtask B in B1 and B2, Rating score 976 and 987, 95% CI [ 941, 1007] and[948, 1016], ranked 5th and 3rd respectively. For the final ranking, organizers will use the Rating (95% CI) score. Even so, our approach still has yielded good results.
wangkongqiang@EEUCA 2026: Multimodal Identification of Vaccine Critical Content on Social Media
Kongqiang Wang | Peng Zhang | Quingli Tan
Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026)
Kongqiang Wang | Peng Zhang | Quingli Tan
Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026)
Our team was interested in content classification and labeling from multimodal meme detection of vaccine critical content on social media.We joined the shared task on Multimodal Identification of Vaccine Critical Content on Social Media@EEUCA with ACL 2026. In this task,our goal is to assign a content classification label to vaccine-related discourse (e.g., Vaccine critical, Neutral, Pro-vaccine). The objectiveis to develop systems that can classify the intent of a vaccine-related meme. The dataset for this task will have three labels: Vaccine critical (0), Neutral (1), and Pro-vaccine (2). The performance will be ranked by F1-score (Macro). This shared task is based on the VaxMeme dataset, a collection of over 10,000 manually annotated vaccination-related memes, designed to support multimodal vaccine-critical meme detection. Our group used a supervised learning method on finetuning pre-trained models and Large Language Model (LLM), including Qwen2 LLMs and Llama series LLMs based on Llama-Factory. The best result on the test set for shared task were Macro F1 score of 0.8153, Accuracy 0.8185, Precision (Macro) 0.8151, and Recall (Macro) 0.8159 from fine-tuning qwen2_1.5B LLM method, ranking 12th among all teams. The complete code of this entire project can be found at our GitHub address.
wangkongqiang@EEUCA 2026: Understanding Toxic Behavioral Intent in Gaming Chat Logs
Kongqiang Wang | Peng Zhang | Quingli Tan
Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026)
Kongqiang Wang | Peng Zhang | Quingli Tan
Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026)
Our team was interested in content classification and labeling from toxicity detection of gaming chat logs in online gaming communities. We joined the shared task on Understanding Toxic Behavioral Intent in Gaming Chat Logs@EEUCA with ACL 2026. In this task, our goal is to assign a content classification label to player’s utterance (e.g., Hate and Harassment, Threats, Non-toxic). The objective is to develop systems that can classify the intent of a player’s utterance. The dataset for this task will have five labels: Non-toxic (0), Insults and Flaming (1), Other Offensive Texts (2), Hate and Harassment (3), Threats (4) and Extremism (5). The performance will be ranked by F1-score (Macro). The task utilizes 53,000 game chat utterances from World of Tanks. Our group used a supervised learning method on multiple pre-trained models and finetuning Qwen2 LLMs. The best result on the test set for shared task were Macro F1 score of 0.5776, Accuracy 0.9075, Precision (Macro) 0.6847, and Recall (Macro) 0.5343 from fine-tuning qwen2_7B LLM method, ranking 8th among all teams. The complete code of this entire project can be found at our GitHub address.
2025
wangkongqiang at SemEval-2025 Task 11:Bridging the Gap in Text-Based Emotion Detection
Wang Kongqiang
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Wang Kongqiang
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
This paper presents our system developed for the SemEval-2025 Task 11:Bridging the Gap in Text-Based Emotion Detection, on Track A: Multi-label Emotion Detection.Given a target text snippet, predict the perceived emotion(s) of the speaker. Specifically, select whether each of the following emotions apply: joy, sadness, fear, anger, surprise, or disgust. To this end, we focus on English source language selection strategies on four different pre-trained languages models: google-bert,FacebookAI-roberta,dccuchile-bert and distilbert-multi.We experiment with 1) the training set data is analyzed visually, 2) multiple numbers of single models are trained on the training set data, and 3) multiple number of single models for votingweight ensemble learning. We further study the influence of different hyperparameters on the integrated model and select the best integration model for the prediction of the test set. Our submission achieved the good ranking place in the test set.Emotion Macro F1 Score 0.6998 and Emotion Micro F1 Score 0.7374. For the final ranking, organizers will use the Macro F1 score.Even so, my approach has yielded good results.
wangkongqiang@CASE 2025: Detection and Classifying Language and Targets of Hate Speech using Auxiliary Text Supervised Learning
Wang Kongqiang | Zhang Peng
Proceedings of the 8th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Texts
Wang Kongqiang | Zhang Peng
Proceedings of the 8th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Texts
Our team was interested in content classification and labeling from multimodal detection of Hate speech, Humor, and Stance in marginalized socio-political movement discourse. We joined the task: Subtask A-Detection of Hate Speech and Subtask B-Classifying the Targets of Hate Speech. In this two task, our goal is to assign a content classification label to multimodal Hate Speech. Detection of Hate Speech: The aim is to detect the presence of hate speech in the images. The dataset for this task will have binary labels: No Hate and Hate. Classifying the Targets of Hate Speech: Given that an image is hateful, the goal here is to identify the targets of hate speech. The dataset here will have four labels: Undirected, Individual, Community, and Organization. Our group used a supervised learning method and a text prediction model. The best result on the test set for Subtask-A and Subtask-B were F1 score of 0.6209 and 0.3453, ranking twentieth and thirteenth among all teams.