基于汉语字词资源的检索增强生成与应用评估(Chinese Character- and Word-Based Retrieval Augmented Generation and Application)
Yin Yaqi (殷雅琦), Liu Yang (刘扬), Wang Yue (王悦), Liang Qiliang (梁启亮)
Abstract
“汉语遵循“由字组词,由词造句”的原则,字词相关信息是一类基础且关键的计算资源。在大语言模型时代,挖掘并评价该类资源的效用是增强模型语言能力的一个重要研究方面。作为有效促进资源与模型结合的一种方式,检索增强生成目前在该类资源上的应用大都关注模型未学习过的濒危语言,其在模型已学习过语言上的潜在价值有待挖掘。本文基于语言学的视角,构建具有良好例句覆盖率与丰富度的字词资源,并借助检索增强生成技术路线,探索这类资源与不同任务、模型的结合方法。评估实验表明,该方法在所有实验模型与任务中均带来了显著的准确率提升,平均达4.78%,其中,在语素义消歧、词义消歧与隐喻识别任务中分别提升了6.91%、4.24%和3.19%,这展示出字词资源对模型的语言准确理解能力的潜在价值。这些资源构造、方法探索和应用评估,为语言学资源与大语言模型的结合提供了新的思路与方法。”- Anthology ID:
- 2024.ccl-1.3
- Volume:
- Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
- Month:
- July
- Year:
- 2024
- Address:
- Taiyuan, China
- Editors:
- Maosong Sun, Jiye Liang, Xianpei Han, Zhiyuan Liu, Yulan He
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 27–45
- Language:
- Chinese
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2024.ccl-1.3/
- DOI:
- Cite (ACL):
- Yin Yaqi, Liu Yang, Wang Yue, and Liang Qiliang. 2024. 基于汉语字词资源的检索增强生成与应用评估(Chinese Character- and Word-Based Retrieval Augmented Generation and Application). In Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference), pages 27–45, Taiyuan, China. Chinese Information Processing Society of China.
- Cite (Informal):
- 基于汉语字词资源的检索增强生成与应用评估(Chinese Character- and Word-Based Retrieval Augmented Generation and Application) (Yaqi et al., CCL 2024)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2024.ccl-1.3.pdf