Qiliang Liang

Also published as: 启亮


2025

pdf bib
LTRS: Improving Word Sense Disambiguation via Learning to Rank Senses
Hansi Wang | Yue Wang | Qiliang Liang | Yang Liu
Proceedings of the 31st International Conference on Computational Linguistics

Word Sense Disambiguation (WSD) is a fundamental task critical for accurate semantic understanding. Conventional training strategies usually only consider predefined senses for target words and learn each of them from relatively limited instances, neglecting the influence of similar ones. To address these problems, we propose the method of Learning to Rank Senses (LTRS) to enhance the task. This method helps a model learn to represent and disambiguate senses from a broadened range of instances via ranking an expanded list of sense definitions. By employing LTRS, our model achieves a SOTA F1 score of 79.6% in Chinese WSD and exhibits robustness in low-resource settings. Moreover, it shows excellent training efficiency, achieving faster convergence than previous methods. This provides a new technical approach to WSD and may also apply to the task for other languages.

2024

pdf bib
Disambiguate Words like Composing Them: A Morphology-Informed Approach to Enhance Chinese Word Sense Disambiguation
Yue Wang | Qiliang Liang | Yaqi Yin | Hansi Wang | Yang Liu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In parataxis languages like Chinese, word meanings are highly correlated with morphological knowledge, which can help to disambiguate word senses. However, in-depth exploration of morphological knowledge in previous word sense disambiguation (WSD) methods is still lacking due to the absence of publicly available resources. In this paper, we are motivated to enhance Chinese WSD with full morphological knowledge, including both word-formations and morphemes. We first construct the largest and releasable Chinese WSD resources, including the lexico-semantic inventories MorInv and WrdInv, a Chinese WSD dataset MiCLS, and an out-of-volcabulary (OOV) test set. Then, we propose a model, MorBERT, to fully leverage this morphology-informed knowledge for Chinese WSD and achieve a SOTA F1 of 92.18% in the task. Finally, we demonstrated the model’s robustness in low-resource settings and generalizability to OOV senses. These resources and methods may bring new insights into and solutions for various downstream tasks in both computational and humanistic fields.

pdf bib
基于汉语字词资源的检索增强生成与应用评估(Chinese Character- and Word-Based Retrieval Augmented Generation and Application)
Yaqi Yin (殷雅琦) | Yang Liu (刘扬) | Yue Wang (王悦) | Qiliang Liang (梁启亮)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“汉语遵循“由字组词,由词造句”的原则,字词相关信息是一类基础且关键的计算资源。在大语言模型时代,挖掘并评价该类资源的效用是增强模型语言能力的一个重要研究方面。作为有效促进资源与模型结合的一种方式,检索增强生成目前在该类资源上的应用大都关注模型未学习过的濒危语言,其在模型已学习过语言上的潜在价值有待挖掘。本文基于语言学的视角,构建具有良好例句覆盖率与丰富度的字词资源,并借助检索增强生成技术路线,探索这类资源与不同任务、模型的结合方法。评估实验表明,该方法在所有实验模型与任务中均带来了显著的准确率提升,平均达4.78%,其中,在语素义消歧、词义消歧与隐喻识别任务中分别提升了6.91%、4.24%和3.19%,这展示出字词资源对模型的语言准确理解能力的潜在价值。这些资源构造、方法探索和应用评估,为语言学资源与大语言模型的结合提供了新的思路与方法。”

pdf bib
Morpheme Sense Disambiguation: A New Task Aiming for Understanding the Language at Character Level
Yue Wang | Hua Zheng | Yaqi Yin | Hansi Wang | Qiliang Liang | Yang Liu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Morphemes serve as a strong linguistic feature to capture lexical semantics, with higher coverage than words and more natural than sememes. However, due to the lack of morpheme-informed resources and the expense of manual annotation, morpheme-enhanced methods remain largely unexplored in Computational Linguistics. To address this issue, we propose the task of Morpheme Sense Disambiguation (MSD), with two subtasks in-text and in-word, similar to Word Sense Disambiguation (WSD) and Sememe Prediction (SP), to generalize morpheme features on more tasks. We first build the MorDis resource for Chinese, including MorInv as a morpheme inventory, MorTxt and MorWrd as two types of morpheme-annotated datasets. Next, we provide two baselines in each evaluation; the best model yields a promising precision of 77.66% on in-text MSD and 88.19% on in-word MSD, indicating its comparability with WSD and superiority over SP. Finally, we demonstrate that predicted morphemes achieve comparable performance with the ground-truth ones on a downstream application of Definition Generation (DG). This validates the feasibility and applicability of our proposed tasks. The resources and workflow of MSD will provide new insights and solutions for downstream tasks, including DG as well as WSD, training pre-trained models, etc.

2023

pdf bib
汉语语义构词的资源建设与计算评估(Construction of Chinese Semantic Word-Formation and its Computing Applications)
Yue Wang (王悦) | Yang Liu (刘扬) | Qiliang Liang (梁启亮) | Hansi Wang (王涵思)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“汉语是一种意合型语言,汉语中语素的构词方式与规律是描述、理解词义的重要因素。关于语素构词的方式,语言学界有语法构词与语义构词这两种观点,其中,语义构词对语素间关系的表达更为深入。本文采取语义构词的路线,基于语言学视角,考虑汉语构词特点,提出了一套面向计算的语义构词结构体系,通过随机森林自动标注与人工校验相结合的方式,构建汉语语义构词知识库,并在词义生成的任务上对该资源进行计算评估。实验取得了良好的结果,基于语义构词知识库的词义生成BLEU值达25.07,较此前的语法构词提升了3.17%,初步验证了这种知识表示方法的有效性。该知识表示方法与资源建设将为人文领域和信息处理等多方面的应用提供新的思路与方案。”