Pengfei Chen

2025

pdf bib abs
PATeam at SemEval-2025 Task 9: LLM-Augmented Fusion for AI-Driven Food Safety Hazard Detection
Xue Wan | Fengping Su | Ling Sun | Yuyang Lin | Pengfei Chen
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper introduces the approach we adopted for the SemEval-2025 “Food Hazard Detection” task, which aims to predict coarse-grained categories (such as “product category” and “hazard category”) and fine-grained vectors (such as specific products like “ice cream” or hazards like “salmonella”) from noisy, long-tailed text data.To address the issues of dirty data, as well as the severe long-tail distribution of text labels and length in the data, we proposed a pipeline system. This system combines data cleaning, LLM-based enhancement, label resampling, and ensemble learning to tackle data sparsity and label imbalance problems.The two subtasks have strong semantic relatedness. By integrating them into a unified multiturn dialogue framework, we fine-tuned five models using a bagging approach. Ultimately, we achieved good results in both subtasks, ranking 5th (with an F1 score of 80.17% for ST1 and 52.66% for ST2).

pdf bib abs
PATeam at SemEval-2025 Task 10: Two-stage News Analytical Framework: Target-oriented Semantic Segmentation and Sequence Generation LLMs for Cross-Lingual Entity and Narrative Analysis
Ling Sun | Xue Wan | Yuyang Lin | Fengping Su | Pengfei Chen
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper presents our approaches for three subtasks in SemEval-2025 Task 10, which focus on entity framing, narrative classification, and narrative extraction in new analysis respectively. We propose a two-stage news analytical framework for both Subtask A and B. In Subtask A (Entity Framing), we design an entity-oriented data processing pipeline to address the issue of redundant information in a news article, and explore effective use of multilingual datasets through sufficient experiments. The system achieves the first place in Bulgarian and the second place in English and Portuguese. In Subtask B (Narrative Classification), a similar narrative-oriented data processing pipeline is adopted to obtain condensed news chunks for each narrative. We conduct in-depth discussion regarding approaches to enhancing both data quality and volume, and explore one-vs-rest classification models and sequence prediction models for multi-label classification tasks. The system ranks first in Bulgarian and second in Russian and Portuguese. In Subtask 3 (Narrative Extraction), we build our system with data augmentation, supervised fine-tuning, and preference-based reinforcement learning. This system achieves the first place in Bulgarian, Russian and Hindi and the second place in Portuguese.

2020

pdf bib abs
Compress Polyphone Pronunciation Prediction Model with Shared Labels
Pengfei Chen | Lina Wang | Hui Di | Kazushige Ouchi | Lvhong Wang
Proceedings of the 19th Chinese National Conference on Computational Linguistics

It is well known that deep learning model has huge parameters and is computationally expensive, especially for embedded and mobile devices. Polyphone pronunciations selection is a basic function for Chinese Text-to-Speech (TTS) application. Recurrent neural network (RNN) is a good sequence labeling solution for polyphone pronunciation selection. However, huge parameters and computation make compression needed to alleviate its disadvantage. In contrast to existing quantization with low precision data format and projection layer, we propose a novel method based on shared labels, which focuses on compressing the fully-connected layer before Softmax for models with a huge number of labels in TTS polyphone selection. The basic idea is to compress large number of target labels into a few label clusters, which will share the parameters of fully-connected layer. Furthermore, we combine it with other methods to further compress the polyphone pronunciation selection model. The experimental result shows that for Bi-LSTM (Bidirectional Long Short Term Memory) based polyphone selection, shared labels model decreases about 52% of original model size and accelerates prediction by 44% almost without performance loss. It is worth mentioning that the proposed method can be applied for other tasks to compress the model and accelerate the calculation.

Co-authors

Kazushige Ouchi 1

Lina Wang 1

Lvhong Wang 1

Venues

Fix author