融入音素特征的英-泰-老多语言神经机器翻译方法(English-Thai-Lao multilingual neural machine translation fused with phonemic features)
Zheng Shen (沈政), Cunli Mao (毛存礼), Zhengtao Yu (余正涛), Shengxiang Gao (高盛祥), Linqin Wang (王琳钦), Yuxin Huang (黄于欣)
Abstract
“多语言神经机器翻译是提升低资源语言翻译质量的有效手段。由于不同语言之间字符差异较大,现有方法难以得到统一的词表征形式。泰语和老挝语属于具有音素相似性的低资源语言,考虑到利用语言相似性能够拉近语义距离,提出一种融入音素特征的多语言词表征学习方法:(1)设计音素特征表示模块和泰老文本表示模块,基于交叉注意力机制得到融合音素特征后的泰老文本表示,拉近泰老之间的语义距离;(2)在微调阶段,基于参数分化得到不同语言对特定的训练参数,缓解联合训练造成模型过度泛化的问题。实验结果表明在ALT数据集上,提出方法在泰-英和老-英两个翻译方向上,相比基线模型提升0.97和0.99个BLEU值。”- Anthology ID:
- 2022.ccl-1.28
- Volume:
- Proceedings of the 21st Chinese National Conference on Computational Linguistics
- Month:
- October
- Year:
- 2022
- Address:
- Nanchang, China
- Editors:
- Maosong Sun (孙茂松), Yang Liu (刘洋), Wanxiang Che (车万翔), Yang Feng (冯洋), Xipeng Qiu (邱锡鹏), Gaoqi Rao (饶高琦), Yubo Chen (陈玉博)
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 305–316
- Language:
- Chinese
- URL:
- https://aclanthology.org/2022.ccl-1.28
- DOI:
- Cite (ACL):
- Zheng Shen, Cunli Mao, Zhengtao Yu, Shengxiang Gao, Linqin Wang, and Yuxin Huang. 2022. 融入音素特征的英-泰-老多语言神经机器翻译方法(English-Thai-Lao multilingual neural machine translation fused with phonemic features). In Proceedings of the 21st Chinese National Conference on Computational Linguistics, pages 305–316, Nanchang, China. Chinese Information Processing Society of China.
- Cite (Informal):
- 融入音素特征的英-泰-老多语言神经机器翻译方法(English-Thai-Lao multilingual neural machine translation fused with phonemic features) (Shen et al., CCL 2022)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2022.ccl-1.28.pdf