This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
QiliangLiang
Also published as:
启亮 梁
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Word Sense Disambiguation (WSD) is a fundamental task critical for accurate semantic understanding. Conventional training strategies usually only consider predefined senses for target words and learn each of them from relatively limited instances, neglecting the influence of similar ones. To address these problems, we propose the method of Learning to Rank Senses (LTRS) to enhance the task. This method helps a model learn to represent and disambiguate senses from a broadened range of instances via ranking an expanded list of sense definitions. By employing LTRS, our model achieves a SOTA F1 score of 79.6% in Chinese WSD and exhibits robustness in low-resource settings. Moreover, it shows excellent training efficiency, achieving faster convergence than previous methods. This provides a new technical approach to WSD and may also apply to the task for other languages.
Link Prediction (LP) aims to predict missing triple information within a Knowledge Graph (KG). Existing LP methods have sought to improve the performance by integrating structural and textual information. However, for lexico-semantic KGs designed to document fine-grained sense distinctions, these types of information may not be sufficient to support effective LP. From a linguistic perspective, word senses within lexico-semantic relations usually show systematic differences in their sememic components. In light of this, we are motivated to enhance LP with sememe knowledge. We first construct a Sememe Prediction (SP) dataset, SememeDef, for learning such knowledge, and two Chinese datasets, HN7 and CWN5, for LP evaluation; Then, we propose a method, SememeLP, to leverage this knowledge for LP fully. It consistently and significantly improves the LP performance in both English and Chinese, achieving SOTA MRR of 75.1%, 80.5%, and 77.1% on WN18RR, HN7, and CWN5, respectively; Finally, an in-depth analysis is conducted, making clear how sememic components can benefit LP for lexico-semantic KGs, which provides promising progress for the completion of them.
In parataxis languages like Chinese, word meanings are highly correlated with morphological knowledge, which can help to disambiguate word senses. However, in-depth exploration of morphological knowledge in previous word sense disambiguation (WSD) methods is still lacking due to the absence of publicly available resources. In this paper, we are motivated to enhance Chinese WSD with full morphological knowledge, including both word-formations and morphemes. We first construct the largest and releasable Chinese WSD resources, including the lexico-semantic inventories MorInv and WrdInv, a Chinese WSD dataset MiCLS, and an out-of-volcabulary (OOV) test set. Then, we propose a model, MorBERT, to fully leverage this morphology-informed knowledge for Chinese WSD and achieve a SOTA F1 of 92.18% in the task. Finally, we demonstrated the model’s robustness in low-resource settings and generalizability to OOV senses. These resources and methods may bring new insights into and solutions for various downstream tasks in both computational and humanistic fields.
Morphemes serve as a strong linguistic feature to capture lexical semantics, with higher coverage than words and more natural than sememes. However, due to the lack of morpheme-informed resources and the expense of manual annotation, morpheme-enhanced methods remain largely unexplored in Computational Linguistics. To address this issue, we propose the task of Morpheme Sense Disambiguation (MSD), with two subtasks in-text and in-word, similar to Word Sense Disambiguation (WSD) and Sememe Prediction (SP), to generalize morpheme features on more tasks. We first build the MorDis resource for Chinese, including MorInv as a morpheme inventory, MorTxt and MorWrd as two types of morpheme-annotated datasets. Next, we provide two baselines in each evaluation; the best model yields a promising precision of 77.66% on in-text MSD and 88.19% on in-word MSD, indicating its comparability with WSD and superiority over SP. Finally, we demonstrate that predicted morphemes achieve comparable performance with the ground-truth ones on a downstream application of Definition Generation (DG). This validates the feasibility and applicability of our proposed tasks. The resources and workflow of MSD will provide new insights and solutions for downstream tasks, including DG as well as WSD, training pre-trained models, etc.