Haotan Guo
2026
Narrative Nexus at SemEval-2026 Task 4: Modeling Narrative Similarity via Instruction-Based Fine-Tuning and Synthetic Data Augmentation
Haotan Guo | Hongbin Na | Zimu Wang | Wei Wang
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Haotan Guo | Hongbin Na | Zimu Wang | Wei Wang
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Narrative similarity assessment requires models to reason beyond surface-level lexical overlap and capture higher-level plot structures and thematic relationships. In this paper, we address SemEval-2026 Task 4 Track A: Narrative Story Similarity by reformulating it as an instruction-following generation problem. We employ parameter-efficient fine-tuning via LoRA to adapt pretrained large language models for triplet-based narrative comparison. To overcome the limitations imposed by the scarcity of human-annotated data, we further incorporate synthetic triplet samples generated by a large language model for data augmentation. Experimental results demonstrate that our fine-tuned Qwen2.5-7B model achieves competitive performance, outperforming the zero-shot GPT-4o-mini baseline. These findings underscore the effectiveness of task-specific adaptation combined with synthetic data augmentation for narrative similarity modeling.
2025
Lost in Pronunciation: Detecting Chinese Offensive Language Disguised by Phonetic Cloaking Replacement
Haotan Guo | Jianfei He | Jiayuan Ma | Hongbin Na | Zimu Wang | Haiyang Zhang | Qi Chen | Wei Wang | Zijing Shi | Tao Shen | Ling Chen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Haotan Guo | Jianfei He | Jiayuan Ma | Hongbin Na | Zimu Wang | Haiyang Zhang | Qi Chen | Wei Wang | Zijing Shi | Tao Shen | Ling Chen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Phonetic Cloaking Replacement (PCR), defined as the deliberate use of homophonic or near-homophonic variants to hide toxic intent, has become a major obstacle to Chinese content moderation. While this problem is well-recognized, existing evaluations predominantly rely on rule-based, synthetic perturbations that ignore the creativity of real users. We organize PCR into a four-way surface-form taxonomy and compile PCR-ToxiCN, a dataset of 500 naturally occurring, phonetically cloaked offensive posts gathered from the RedNote platform. Benchmarking state-of-the-art LLMs on this dataset exposes a serious weakness: the best model reaches only an F1-score of 0.672, and zero-shot chain-of-thought prompting pushes performance even lower. Guided by error analysis, we revisit a Pinyin-based prompting strategy that earlier studies judged ineffective and show that it recovers much of the lost accuracy. This study offers the first comprehensive taxonomy of Chinese PCR, a realistic benchmark that reveals current detectors’ limits, and a lightweight mitigation technique that advances research on robust toxicity detection.