“细粒度英汉机器翻译错误分析语料库”的构建与思考(Construction of Fine-Grained Error Analysis Corpus of English-Chinese Machine Translation and Its Implications)
Bailian Qiu (裘白莲), Mingwen Wang (王明文), Maoxi Li (李茂西), Cong Chen (陈聪), Fan Xu (徐凡)
Abstract
机器翻译错误分析旨在找出机器译文中存在的错误,包括错误类型、错误分布等,它在机器翻译研究和应用中起着重要作用。该文将人工译后编辑与错误分析结合起来,对译后编辑操作进行错误标注,采用自动标注和人工标注相结合的方法,构建了一个细粒度英汉机器翻译错误分析语料库,其中每一个标注样本包括源语言句子、机器译文、人工参考译文、译后编辑译文、词错误率和错误类型标注;标注的错误类型包括增词、漏词、错词、词序错误、未译和命名实体翻译错误等。标注的一致性检验表明了标注的有效性;对标注语料的统计分析结果能有效地指导机器翻译系统的开发和人工译员的后编辑。- Anthology ID:
- 2020.ccl-1.39
- Volume:
- Proceedings of the 19th Chinese National Conference on Computational Linguistics
- Month:
- October
- Year:
- 2020
- Address:
- Haikou, China
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 424–433
- Language:
- Chinese
- URL:
- https://aclanthology.org/2020.ccl-1.39
- DOI:
- Cite (ACL):
- Bailian Qiu, Mingwen Wang, Maoxi Li, Cong Chen, and Fan Xu. 2020. “细粒度英汉机器翻译错误分析语料库”的构建与思考(Construction of Fine-Grained Error Analysis Corpus of English-Chinese Machine Translation and Its Implications). In Proceedings of the 19th Chinese National Conference on Computational Linguistics, pages 424–433, Haikou, China. Chinese Information Processing Society of China.
- Cite (Informal):
- “细粒度英汉机器翻译错误分析语料库”的构建与思考(Construction of Fine-Grained Error Analysis Corpus of English-Chinese Machine Translation and Its Implications) (Qiu et al., CCL 2020)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2020.ccl-1.39.pdf