Tan Qingli
2026
wangkongqiang at SemEval-2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures
Wang Kongqiang | Zhang Peng | Tan Qingli
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Wang Kongqiang | Zhang Peng | Tan Qingli
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper presents our system developed for the SemEval-2026 Task 7: Everyday KnowledgeAcross Diverse Languages and Cultures. on Subtask 1: Short Answer Questions (SAQ). on Subtask 2: Multiple-Choice Questions (MCQ). To this end, we focus on models’ cultural competence across 26 languages and 30 countries using four different versions large language models (LLMs): deepseek-v3.2-exp, qwen-max, qwen-plus, and qwen3-next-80ba3b-instruct. We experiment with 1) the trialand test dataset is analyzed visually, 2) use the large language generative model to perform generate or select the answer that it deems correct on the trial and test dataset through prompts, and 3) many prompt engineering approaches of generative models are evaluated on the trial dataset. We further study the influence of different hyperparameters on the generative model and select the best single model for the prediction of the test dataset. Our submission achieved the good ranking place in the test dataset leaderboard. For Subtask 1 (SAQ), the evaluation criteria for this task mainly consistof the aggregate results of the 23 languages: ar-EG, ar-MA, ar-SA, bg-BG, el-GR, en-AU, and so on, and they are measured using the accuracy score. For Subtask 2 (MCQ), this task is essentially a multiple-choice task for questions text. Performance will be evaluated using accuracy score. In other words, this subtask evaluated using accuracy score based on the correctness of the selected answer across different languages and cultural contexts. For Subtask 1 (SAQ) and Subtask 2 (MCQ), our best approach is to obtain the results in test dataset are accuracy score 51.4689 and accuracy score 80.26 separately. For the final ranking, organizers will use the aggregate results of accuracy score. Even so,our approach has yielded good results.
wangkongqiang at SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization
Wang Kongqiang | Tan Qingli
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Wang Kongqiang | Tan Qingli
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper presents our system developed for the SemEval-2026 Task 9: Detecting Multilingual,Multicultural and Multievent Online Polarization. on Subtask 1: Multilingual Text Classification Challenge - Polarization Detection. on Subtask 2: Multilingual Text Classification Challenge - Polarization Type Classification. on Subtask 3: Multilingual Text Classification Challenge - Manifestation Identification. To this end, we focus on English and Spanish language use two different pre-trained languages models: models–google-bert–bertbase-uncased, and models–microsoft–debertav3-base. We experiment with 1) the training set data is analyzed visually, 2) use the gemma-3-27b-it generative model to perform data augmentation on the training dataset through prompts, and 3) multiple numbers of single models are trained on the training set data. We further study the influence of different hyperparameters on the single model and select the best single model for the prediction of the test set. Our submission achieved the good ranking place in the test set. All subtasks evaluated using Macro F1 score across different languages and cultural contexts. For Subtask 1, the English and Spanish language tasks are Macro F1 Score 0.7805 and 0.7155 respectively. For Subtask 2, the English and Spanish language tasks are Macro F1 Score 0.2603 and 0.4647 respectively. For Subtask 3, the English and Spanish language tasks are Macro F1 Score 0.2766 and 0.3322 respectively. For the final ranking, organizers will use the Macro F1 score. Even so, my approach has yielded good results from an overall perspective.
wangkongqiang at SemEval-2026 Task 10: PsyCoMark- Psycholinguistic Conspiracy Marker Extraction and Detection
Wang Kongqiang | Tan Qingli
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Wang Kongqiang | Tan Qingli
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper presents our system developed for the SemEval-2026 Task 10: PsyCoMark Psycholinguistic Conspiracy Marker Extraction and Detection. on Subtask 1: Conspiracy Marker Extraction. on Subtask 2: Conspiracy Detection. To this end, we focus on English language use four different pre-trained languages models: models–distilbert–distilbert-base uncased, models–distilbert–distilbert-base-multilingual-cased, models–lxyuan–distilbert-base-multilingual-cased-sentiments-student, and models–microsoft–deberta-v3-base. We experiment with 1) the training set data is analyzed visually, 2) use the gemma-3-27b-it generative model to perform data augmentation on the training dataset through prompts for Subtask 2: Conspiracy Detection, and 3) multiple numbers of single models are trained on the training set data. We further study the influence of different hyperparameters on the single model and select the best single model for the prediction of the test set. Our submission achieved the good ranking place in the test set leaderboard. For Subtask 1, the evaluation criteria for this task mainly consist of the aggregate results of the four markers: Actor, Action, Effect, and Victim, and they are measured using the Macro F1 score. For Subtask 2, this task is essentially a binary classification task for text. Performance will be evaluated using macro-averaged F1 score. In other words, this subtask evaluated using Weighted F1 score across different sentences and cultural contexts. For Subtask 1 and Subtask 2, our best approach is to obtain the results are Macro F1 score 0.1587 and Weighted F1 score 0.7411 separately. For the final ranking, organizers will use the aggregate results of Macro F1 score and Weighted F1 score. Even so, our approach has yielded good results.
wangkongqiang at SemEval-2026 Task 1: MWAHAHA- Competition on Humor Generation
Wang Kongqiang | Zhang Peng | Tan Qingli
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Wang Kongqiang | Zhang Peng | Tan Qingli
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper presents our system developed for the SemEval-2026 Task 1: MWAHAHA-Competition on Humor Generation. on Subtask A: Text-based Humor Generation. Given a set of text-based constraints, generate a joke. This subtask A will be conducted in English, Spanish, and Chinese. on Subtask B: Image-Based Caption Generation. This subtask explores humor in a multimodal context, combining visual inputs with text generation. This subtask B is in English only. To this end, we mainly focus on Subtask A: Text-based Humor Generation in English and Chinese, Subtask B: Image-BasedCaption Generation in English language to use two important languages models: BLIP and Qwen series LLM. For Task B1: Image-only Humor Generation and Task B2: Image and Prompt Humor Generation. Our submission achieved the good ranking place in the test set. All subtasks evaluated using Rating (95% CI) score across different languages and modality contexts. For Subtask A in English and Chinese, Rating score 950 and 1054, 95% CI [ 922, 982] and [1024, 1104], ranked 16th and 1st respectively. For Subtask B in B1 and B2, Rating score 976 and 987, 95% CI [ 941, 1007] and[948, 1016], ranked 5th and 3rd respectively. For the final ranking, organizers will use the Rating (95% CI) score. Even so, our approach still has yielded good results.