基于预训练模型与序列建模的音素分割方法(Sequence Modeling)
Yang Shanglong (杨尚龙), Yu Zhengtao (余正涛), Wang Wenjun (王文君), Dong Ling (董凌), Gao Shengxiang (高盛祥)
Abstract
“音素分割作为语音处理领域内的一个重要任务,对于关键词识别、自动语音识别等应用具有至关重要的意义。传统方法往往独立预测每一帧音频是否为音素边界,忽视了音素边界与整个音频序列以及相邻帧之间的内在联系,从而影响了分割的准确性和连贯性。本文提出一种基于预训练模型与序列建模的音素分割方法,在HuBERT模型提取声学特征的基础上,结合BiLSTM捕捉长期依赖,再用CRF优化序列,提升了音素边界检测的性能。在TIMIT和Buckeye数据集上的实验表明,本文方法优于现有技术,证明了序列建模在音素分割任务中的有效性。”- Anthology ID:
- 2024.ccl-1.49
- Volume:
- Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
- Month:
- July
- Year:
- 2024
- Address:
- Taiyuan, China
- Editors:
- Maosong Sun, Jiye Liang, Xianpei Han, Zhiyuan Liu, Yulan He
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 625–636
- Language:
- Chinese
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.ccl-1.49/
- DOI:
- Cite (ACL):
- Yang Shanglong, Yu Zhengtao, Wang Wenjun, Dong Ling, and Gao Shengxiang. 2024. 基于预训练模型与序列建模的音素分割方法(Sequence Modeling). In Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference), pages 625–636, Taiyuan, China. Chinese Information Processing Society of China.
- Cite (Informal):
- 基于预训练模型与序列建模的音素分割方法(Sequence Modeling) (Shanglong et al., CCL 2024)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.ccl-1.49.pdf