Chenhui Xie


2022

pdf
句式结构树库的自动构建研究(Automatic Construction of Sentence Pattern Structure Treebank)
Chenhui Xie (谢晨晖) | Zhengsheng Hu (胡正升) | Liner Yang (杨麟儿) | Tianxin Liao (廖田昕) | Erhong Yang (杨尔弘)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“句式结构树库是以句本位语法为理论基础构建的句法资源,对汉语教学以及句式结构自动句法分析等研究具有重要意义。目前已有的句式结构树库语料主要来源于教材领域,其他领域的标注数据较为缺乏,如何高效地扩充高质量的句法树库是值得研究的问题。人工标注句法树库费时费力,并且树库质量也难以保证,为此,本文尝试通过规则的方法,将宾州中文树库(ctb)转换为句式结构树库,从而扩大现有句式结构树库的规模。实验结果表明,本文提出的基于树库转换规则的方法是有效的。”

pdf
COMPILING: A Benchmark Dataset for Chinese Complexity Controllable Definition Generation
Jiaxin Yuan | Cunliang Kong | Chenhui Xie | Liner Yang | Erhong Yang
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“The definition generation task aims to generate a word’s definition within a specific context automatically. However, owing to the lack of datasets for different complexities, the definitions produced by models tend to keep the same complexity level. This paper proposes a novel task of generating definitions for a word with controllable complexity levels. Correspondingly, we introduce COMPILING, a dataset given detailed information about Chinese definitions, and each definition is labeled with its complexity levels. The COMPILING dataset includes 74,303 words and 106,882 definitions. To the best of our knowledge, it is the largest dataset of the Chinese definition generation task. We select various representative generation methods as baselines for this task and conduct evaluations, which illustrates that our dataset plays an outstanding role in assisting models in generating different complexity-level definitions. We believe that the COMPILING dataset will benefit further research in complexity controllable definition generation.”