Dailin Li


2025

pdf bib
Empowering Persuasion Detection in Slavic Texts through Two-Stage Generative Reasoning
Xin Zou | Chuhan Wang | Dailin Li | Yanan Wang | Jian Wang | Hongfei Lin
Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025)

This paper presents our submission to Subtask 2 (multi-label classification of persuasion techniques) of the Shared Task on Detection and Classification of Persuasion Techniques in Slavic Languages at SlavNLP 2025. Our method leverages a teacher–student framework based on large language models (LLMs): a Qwen3 32B teacher model generates natural language explanations for annotated persuasion techniques, and a Qwen2.5 32B student model is fine-tuned to replicate both the teacher’s rationales and the final label predictions. We train our models on the official shared task dataset, supplemented by annotated resources from SemEval 2023 Task 3 and CLEF 2024 Task 3 covering English, Russian, and Polish to improve cross-lingual robustness. Our final system ranks 4th on BG, SI, and HR, and 5th on PL in terms of micro-F1 score among all participating teams.

pdf bib
STORM-BORN: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent Framework
Wenhao Liu | Zhenyi Lu | Xinyu Hu | Jerry Zhang | Dailin Li | Jiacheng Cen | Huilin Cao | Haiteng Wang | Yuhan Li | Xie Kun | Dandan Li | Pei Zhang | Chengbo Zhang | Yuxiang Ren | Xiaohong Huang | Yan Ma
Findings of the Association for Computational Linguistics: ACL 2025

High-quality math datasets are crucial for advancing the reasoning abilities of large language models (LLMs). However, existing datasets often suffer from three key issues: outdated and insufficient challenging content, neglecting human-like reasoning, and limited reliability due to single-LLM generation.To address these, we introduce STORM-BORN, an ultra-challenging dataset of mathematical derivations sourced from cutting-edge academic papers, which includes dense human-like approximations and heuristic cues.To ensure the reliability and quality, we propose a novel human-in-the-loop, multi-agent data generation framework, integrating reasoning-dense filters, multi-agent collaboration, and human mathematicians’ evaluations. We curated a set of 2,000 synthetic samples and deliberately selected the 100 most difficult problems.Even most advanced models like GPT-o1 solved fewer than 5% of them. Fine-tuning on STORM-BORN boosts accuracy by 7.84% (LLaMA3-8B) and 9.12% (Qwen2.5-7B).As AI approaches mathematician-level reasoning, STORM-BORN provides both a high-difficulty benchmark and a human-like reasoning training resource. Our code and dataset are publicly available at https://github.com/lwhere/STORM-BORN.

2024

pdf bib
CoT-based Data Augmentation Strategy for Persuasion Techniques Detection
Dailin Li | Chuhan Wang | Xin Zou | Junlong Wang | Peng Chen | Jian Wang | Liang Yang | Hongfei Lin
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Detecting persuasive communication is an important topic in Natural Language Processing (NLP), as it can be useful in identifying fake information on social media. We have developed a system to identify applied persuasion techniques in text fragments across four languages: English, Bulgarian, North Macedonian, and Arabic. Our system uses data augmentation methods and employs an ensemble strategy that combines the strengths of both RoBERTa and DeBERTa models. Due to limited resources, we concentrated solely on task 1, and our solution achieved the top ranking in the English track during the official assessments. We also analyse the impact of architectural decisions, data constructionand training strategies.