Ning Li


2025

Information extraction (IE) in specialized domains like computer science and chemistry is challenged by the poor generalization of traditional models and the knowledge deficits of general-purpose Large Language Models (LLMs). We introduce a robust, LLM-based framework featuring two core contributions: an end-to-end training and inference paradigm that combines continual pre-training (CPT) for knowledge injection, supervised fine-tuning (SFT) for task alignment, and retrieval-augmented generation (RAG) for inference-time enhancement; and a novel LLM-assisted data annotation pipeline for the efficient creation of high-quality training data. Comprehensive experiments demonstrate that while fine-tuning alone yields strong in-domain performance, our complete framework exhibits superior robustness and generalization. It consistently achieves state-of-the-art results in challenging domain-shift and novel-schema scenarios, validating our integrated approach for building adaptable and high-performance domain-specific IE systems.
A two-stage role classification model based on DeBERTa is proposed for the Entity Framework task in SemEval 2025 Task 10. The task is confronted with challenges such as multi-labeling, multi-category, and category imbalance, particularly in the semantic overlap and data sparsity of fine-grained roles. Existing methods primarily rely on rules, traditional machine learning, or deep learning, but the accurate classification of fine-grained roles is difficult to achieve. To address this, the proposed model integrates the deep semantic representation of the DeBERTa pre-trained language model through two sub-models: main role classification and sub-role classification, and utilizes Focal Loss to optimize the category imbalance issue. Experimental results indicate that the model achieves an accuracy of 75.32% in predicting the main role, while the exact matching rate for the sub-role is 8.94%. This is mainly limited by the strict matching standard and semantic overlap of fine-grained roles in the multi-label task. Compared to the baseline’s sub-role exact matching rate of 3.83%, the proposed model significantly improves this metric. The model ultimately ranked 23rd on the leaderboard. The code of this paper is available at:https://github.com/jiyuaner/YNU-HPCC-at-SemEval-2025-Task10.