Chinese Spoken Named Entity Recognition in Real-world Scenarios: Dataset and Approaches
Shilin Zhou, Zhenghua Li, Chen Gong, Lei Zhang, Yu Hong, Min Zhang
Abstract
Spoken Named Entity Recognition (NER) aims to extract entities from speech. The extracted entities can help voice assistants better understand user’s questions and instructions. However, current Chinese Spoken NER datasets are laboratory-controlled data that are collected by reading existing texts in quiet environments, rather than natural spoken data, and the texts used for reading are also limited in topics. These limitations obstruct the development of Spoken NER in more natural and common real-world scenarios. To address this gap, we introduce a real-world Chinese Spoken NER dataset (RWCS-NER), encompassing open-domain daily conversations and task-oriented intelligent cockpit instructions. We compare several mainstream pipeline approaches on RWCS-NER. The results indicate that the current methods, affected by Automatic Speech Recognition (ASR) errors, do not perform satisfactorily in real settings. Aiming to enhance Spoken NER in real-world scenarios, we propose two approaches: self-training-asr and mapping then distilling (MDistilling). Experiments show that both approaches can achieve significant improvements, particularly MDistilling. Even compared with GPT4.0, MDistilling still reaches better results. We believe that our work will advance the field of Spoken NER in real-world settings.- Anthology ID:
- 2024.findings-acl.111
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2024
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Lun-Wei Ku, Andre Martins, Vivek Srikumar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1872–1884
- Language:
- URL:
- https://aclanthology.org/2024.findings-acl.111
- DOI:
- 10.18653/v1/2024.findings-acl.111
- Cite (ACL):
- Shilin Zhou, Zhenghua Li, Chen Gong, Lei Zhang, Yu Hong, and Min Zhang. 2024. Chinese Spoken Named Entity Recognition in Real-world Scenarios: Dataset and Approaches. In Findings of the Association for Computational Linguistics: ACL 2024, pages 1872–1884, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- Chinese Spoken Named Entity Recognition in Real-world Scenarios: Dataset and Approaches (Zhou et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/autopr/2024.findings-acl.111.pdf