Pengxiu Lu

Also published as: 芃秀


2025

Lemmatization of cuneiform languages presents a unique challenge due to their complex writing system, which combines syllabic and logographic elements. In this study, we investigate the effectiveness of the ByT5 model in addressing this challenge by developing and evaluating a ByT5-based lemmatization system. Experimental results demonstrate that ByT5 outperforms mT5 in this task, achieving an accuracy of 80.55% on raw lemmas and 82.59% on generalized lemmas, where sense numbers are removed. These findings highlight the potential of ByT5 for lemmatizing cuneiform languages and provide useful insights for future work on ancient text lemmatization.

2024

“篇章共指体现篇章概念的动态转移,成为近年研究热点。本文在梳理共指理论研究的基础上,综述了相关语料库及解析方法,发现共指语料库仍存在以下两个问题:共指关系标注粗疏与基本不考虑整句语义表示的融合。本文以句子级语义标注体系(中文抽象语义表示)为基础构建篇章共指体系,构建了 100 篇共指语料库。本体系涵盖 52 种句内语义关系和 8 种篇章共指关系,二者相结合构建的篇章共指语义图,为篇章级语义分析提供新的框架和数据资源。”