@inproceedings{kongqiang-etal-2026-wangkongqiang-semeval,
title = "wangkongqiang at {S}em{E}val-2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures",
author = "Kongqiang, Wang and
Peng, Zhang and
Qingli, Tan",
editor = "Kochmar, Ekaterina and
Ghosh, Debanjan and
North, Kai and
Komachi, Mamoru",
booktitle = "Proceedings of the 20th {I}nternational {W}orkshop on {S}emantic {E}valuation (2026)",
month = jul,
year = "2026",
address = "San Diego, California, USA",
publisher = "Association for Computational Linguistics",
url = "https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.36/",
pages = "247--255",
ISBN = "979-8-89176-414-9",
abstract = "This paper presents our system developed for the SemEval-2026 Task 7: Everyday KnowledgeAcross Diverse Languages and Cultures. on Subtask 1: Short Answer Questions (SAQ). on Subtask 2: Multiple-Choice Questions (MCQ). To this end, we focus on models' cultural competence across 26 languages and 30 countries using four different versions large language models (LLMs): deepseek-v3.2-exp, qwen-max, qwen-plus, and qwen3-next-80ba3b-instruct. We experiment with 1) the trialand test dataset is analyzed visually, 2) use the large language generative model to perform generate or select the answer that it deems correct on the trial and test dataset through prompts, and 3) many prompt engineering approaches of generative models are evaluated on the trial dataset. We further study the influence of different hyperparameters on the generative model and select the best single model for the prediction of the test dataset. Our submission achieved the good ranking place in the test dataset leaderboard. For Subtask 1 (SAQ), the evaluation criteria for this task mainly consistof the aggregate results of the 23 languages: ar-EG, ar-MA, ar-SA, bg-BG, el-GR, en-AU, and so on, and they are measured using the accuracy score. For Subtask 2 (MCQ), this task is essentially a multiple-choice task for questions text. Performance will be evaluated using accuracy score. In other words, this subtask evaluated using accuracy score based on the correctness of the selected answer across different languages and cultural contexts. For Subtask 1 (SAQ) and Subtask 2 (MCQ), our best approach is to obtain the results in test dataset are accuracy score 51.4689 and accuracy score 80.26 separately. For the final ranking, organizers will use the aggregate results of accuracy score. Even so,our approach has yielded good results."
}Markdown (Informal)
[wangkongqiang at SemEval-2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures](https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.36/) (Kongqiang et al., SemEval 2026)
ACL