Xue Wan


2025

pdf bib
PATeam at SemEval-2025 Task 9: LLM-Augmented Fusion for AI-Driven Food Safety Hazard Detection
Xue Wan | Fengping Su | Ling Sun | Yuyang Lin | Pengfei Chen
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper introduces the approach we adopted for the SemEval-2025 “Food Hazard Detection” task, which aims to predict coarse-grained categories (such as “product category” and “hazard category”) and fine-grained vectors (such as specific products like “ice cream” or hazards like “salmonella”) from noisy, long-tailed text data.To address the issues of dirty data, as well as the severe long-tail distribution of text labels and length in the data, we proposed a pipeline system. This system combines data cleaning, LLM-based enhancement, label resampling, and ensemble learning to tackle data sparsity and label imbalance problems.The two subtasks have strong semantic relatedness. By integrating them into a unified multiturn dialogue framework, we fine-tuned five models using a bagging approach. Ultimately, we achieved good results in both subtasks, ranking 5th (with an F1 score of 80.17% for ST1 and 52.66% for ST2).

pdf bib
PATeam at SemEval-2025 Task 10: Two-stage News Analytical Framework: Target-oriented Semantic Segmentation and Sequence Generation LLMs for Cross-Lingual Entity and Narrative Analysis
Ling Sun | Xue Wan | Yuyang Lin | Fengping Su | Pengfei Chen
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper presents our approaches for three subtasks in SemEval-2025 Task 10, which focus on entity framing, narrative classification, and narrative extraction in new analysis respectively. We propose a two-stage news analytical framework for both Subtask A and B. In Subtask A (Entity Framing), we design an entity-oriented data processing pipeline to address the issue of redundant information in a news article, and explore effective use of multilingual datasets through sufficient experiments. The system achieves the first place in Bulgarian and the second place in English and Portuguese. In Subtask B (Narrative Classification), a similar narrative-oriented data processing pipeline is adopted to obtain condensed news chunks for each narrative. We conduct in-depth discussion regarding approaches to enhancing both data quality and volume, and explore one-vs-rest classification models and sequence prediction models for multi-label classification tasks. The system ranks first in Bulgarian and second in Russian and Portuguese. In Subtask 3 (Narrative Extraction), we build our system with data augmentation, supervised fine-tuning, and preference-based reinforcement learning. This system achieves the first place in Bulgarian, Russian and Hindi and the second place in Portuguese.