Pseudo-label Data Construction Method and Syntax-enhanced Model for Chinese Semantic Error Recognition
Hongyan Wu, Nankai Lin, Shengyi Jiang, Lianxi Wang, Aimin Yang
Abstract
Chinese Semantic Error Recognition (CSER) has always been a weak link in Chinese language processing due to the complexity and obscureness of Chinese semantics. Existing research has gradually focused on leveraging pre-trained models to perform CSER. Although some researchers have attempted to integrate syntax information into the pre-trained language model, it requires training the models from scratch, which is time-consuming and laborious. Furthermore, despite the existence of datasets for CSER, the constrained size of these datasets impairs the performance of the models. Thus, in order to address the difficulty posed by a limited sample set and the need of annotating samples with semantic-level errors, we propose a Pseudo-label Data Construction method for CSER (PDC-CSER), generating pseudo-labels for augmented samples based on perplexity and model respectively, which overcomes the difficulty of constructing pseudo-label data containing semantic-level errors and ensures the quality of pseudo-labels. Moreover, we propose a CSER method with the Dependency Syntactic Attention mechanism (CSER-DSA) to explicitly infuse dependency syntactic information only in the fine-tuning stage, achieving robust performance, and simultaneously reducing substantial computing power and time cost. Results demonstrate that the pseudo-label technology PDC-CSER and the semantic error recognition method CSER-DSA surpass the existing models- Anthology ID:
- 2025.coling-main.361
- Volume:
- Proceedings of the 31st International Conference on Computational Linguistics
- Month:
- January
- Year:
- 2025
- Address:
- Abu Dhabi, UAE
- Editors:
- Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
- Venue:
- COLING
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5391–5402
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2025.coling-main.361/
- DOI:
- Cite (ACL):
- Hongyan Wu, Nankai Lin, Shengyi Jiang, Lianxi Wang, and Aimin Yang. 2025. Pseudo-label Data Construction Method and Syntax-enhanced Model for Chinese Semantic Error Recognition. In Proceedings of the 31st International Conference on Computational Linguistics, pages 5391–5402, Abu Dhabi, UAE. Association for Computational Linguistics.
- Cite (Informal):
- Pseudo-label Data Construction Method and Syntax-enhanced Model for Chinese Semantic Error Recognition (Wu et al., COLING 2025)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2025.coling-main.361.pdf