Abstract
“语音命名实体识别(Speech Named Entity Recognition, SNER)旨在从音频中识别出语音中命名实体的边界、种类和内容,是口语理解中的重要任务之一。直接从语音中识别出命名实体,即端到端方法是SNER目前的主流方法。但是语音命名实体识别的训练语料较少,端到端模型存在以下问题:(1)在跨领域识别的情况下模型的识别效果会有大幅度的下降。(2)模型在识别过程中会因同音词等现象对命名实体漏标、错标,进一步影响命名实体识别的准确性。针对问题(1),本文提出使用预训练实体识别模型构建语音实体识别的训练语料。针对问题(2),本文提出采用预训练语言模型对语音命名实体识别的N-BEST列表重打分,利用预训练模型中的外部知识帮助端到端模型挑选出最好的结果。为了验证模型的领域迁移能力,本文标注了少样本口语型数据集MAGICDATA-NER,在此数据上的实验表明,本文提出的方法相对于传统方法在F1值上有43.29%的提高。”- Anthology ID:
- 2023.ccl-1.16
- Volume:
- Proceedings of the 22nd Chinese National Conference on Computational Linguistics
- Month:
- August
- Year:
- 2023
- Address:
- Harbin, China
- Editors:
- Maosong Sun, Bing Qin, Xipeng Qiu, Jing Jiang, Xianpei Han
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 174–185
- Language:
- Chinese
- URL:
- https://aclanthology.org/2023.ccl-1.16
- DOI:
- Cite (ACL):
- Tianwei Lan and Yuhang Guo. 2023. 融合预训练模型的端到端语音命名实体识别(End-to-End Speech Named Entity Recognition with Pretrained Models). In Proceedings of the 22nd Chinese National Conference on Computational Linguistics, pages 174–185, Harbin, China. Chinese Information Processing Society of China.
- Cite (Informal):
- 融合预训练模型的端到端语音命名实体识别(End-to-End Speech Named Entity Recognition with Pretrained Models) (Lan & Guo, CCL 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2023.ccl-1.16.pdf