Yongwei Zhang

2025

pdf bib abs
SRCB at SemEval-2025 Task 9: LLM Finetuning Approach based on External Attention Mechanism in The Food Hazard Detection
Yuming Zhang | Hongyu Li | Yongwei Zhang | Shanshan Jiang | Bin Dong
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper reports on the performance of SRCB’s system in SemEval-2025 Task 9: The Food Hazard Detection Challenge. We develop a system in the form of a pipeline consisting of two parts: 1. Candidate Recall Module, which selects the most probable correct labels from a large number of labels based on BERT model; 2. LLM Prediction Module, which is used to generate the final prediction based on Large Language Models(LLM). Additionally, to address the issue of long prompts caused by an excessive number of labels, we propose a model architecture to reduce resource consumption and improve performance. Our submission achieves the macro-F1 score of 80.39 on Sub-Task 1 and the macro-F1 score of 54.73 on Sub-Task 2. Our system is released at https://github.com/Doraxgui/Document_Attention

2024

pdf bib abs
SRCB at #SMM4H 2024: Making Full Use of LLM-based Data Augmentation in Adverse Drug Event Extraction and Normalization
Hongyu Li | Yuming Zhang | Yongwei Zhang | Shanshan Jiang | Bin Dong
Proceedings of the 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks

This paper reports on the performance of SRCB’s system in the Social Media Mining for Health (#SMM4H) 2024 Shared Task 1: extraction and normalization of adverse drug events (ADEs) in English tweets. We develop a system composed of an ADE extraction module and an ADE normalization module which furtherly includes a retrieval module and a filtering module. To alleviate the data imbalance and other issues introduced by the dataset, we employ 4 data augmentation techniques based on Large Language Models (LLMs) across both modules. Our best submission achieves an F1 score of 53.6 (49.4 on the unseen subset) on the ADE normalization task and an F1 score of 52.1 on ADE extraction task.

2023

pdf bib abs
SRCB at SemEval-2023 Task 2: A System of Complex Named Entity Recognition with External Knowledge
Yuming Zhang | Hongyu Li | Yongwei Zhang | Shanshan Jiang | Bin Dong
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

The MultiCoNER II shared task aims at detecting semantically ambiguous and complex named entities in short and low-context settings for multiple languages. The lack of context makes the recognition of ambiguous named entities challenging. To alleviate this issue, our team SRCB proposes an external knowledge based system, where we utilize 3 different types of external knowledge retrieved in different ways. Given an original text, our system retrieves the possible labels and the descriptions for each potential entity detected by a mention detection model. And we also retrieve a related document as extra context from Wikipedia for each original text. We concatenate the original text with the external knowledge as the input of NER models. The informative contextual representations with external knowledge significantly improve the NER performance in both Chinese and English tracks. Our system win the 3rd place in the Chinese track and the 6th place in the English track.

2018

pdf bib abs
Ling@CASS Solution to the NLP-TEA CGED Shared Task 2018
Qinan Hu | Yongwei Zhang | Fang Liu | Yueguo Gu
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

In this study, we employ the sequence to sequence learning to model the task of grammar error correction. The system takes potentially erroneous sentences as inputs, and outputs correct sentences. To breakthrough the bottlenecks of very limited size of manually labeled data, we adopt a semi-supervised approach. Specifically, we adapt correct sentences written by native Chinese speakers to generate pseudo grammatical errors made by learners of Chinese as a second language. We use the pseudo data to pre-train the model, and the CGED data to fine-tune it. Being aware of the significance of precision in a grammar error correction system in real scenarios, we use ensembles to boost precision. When using inputs as simple as Chinese characters, the ensembled system achieves a precision at 86.56% in the detection of erroneous sentences, and a precision at 51.53% in the correction of errors of Selection and Missing types.

pdf bib abs
CMMC-BDRC Solution to the NLP-TEA-2018 Chinese Grammatical Error Diagnosis Task
Yongwei Zhang | Qinan Hu | Fang Liu | Yueguo Gu
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

Chinese grammatical error diagnosis is an important natural language processing (NLP) task, which is also an important application using artificial intelligence technology in language education. This paper introduces a system developed by the Chinese Multilingual & Multimodal Corpus and Big Data Research Center for the NLP-TEA shared task, named Chinese Grammar Error Diagnosis (CGED). This system regards diagnosing errors task as a sequence tagging problem, while takes correction task as a text classification problem. Finally, in the 12 teams, this system gets the highest F1 score in the detection task and the second highest F1 score in mean in the identification task, position task and the correction task.

Co-authors

Qinan Hu 2

Fang Liu 2

Venues

Fix data