2025
pdf
bib
abs
DVAGen: Dynamic Vocabulary Augmented Generation
Wei Du
|
Nuowei Liu
|
Jie Wang
|
Jiahao Kuang
|
Tao Ji
|
Xiaoling Wang
|
Yuanbin Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Language models trained with a fixed vocabulary struggle to generalize to novel or out-of-vocabulary words, limiting their flexibility in handling diverse token combinations. Existing dynamic vocabulary approaches attempt to address this limitation but face challenges such as fragmented codebases, lack of support for modern LLMs, and limited inference scalability. To overcome these issues, we introduce DVAGen, a fully open-source, unified framework designed for training, evaluation, and visualization of dynamic vocabulary-augmented language models. Our framework modularizes the pipeline for ease of customization, integrates seamlessly with open-source LLMs, and is the first to provide both CLI and WebUI tools for real-time result inspection. We validate the effectiveness of dynamic vocabulary methods on modern LLMs and demonstrate support for batch inference, significantly improving inference throughput.
2024
pdf
bib
abs
Chinese Essay Rhetoric Recognition and Understanding (CERRU)
Nuowei Liu
|
Xinhao Chen
|
Yupei Ren
|
Man Lan
|
Xiaopeng Bai
|
Yuanbin Wu
|
Shaoguang Mao
|
Yan Xia
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
“Rhetoric is fundamental to the reading comprehension and writing skills of primary and middle school students. However, current work independently recognize single coarse-grained categories or fine-grained categories. In this paper, we propose the CCL24-Eval Task6: Chinese Essay Rhetoric Recognition and Understanding (CERRU), consisting of 3 tracks: (1) Fine-grained Form-level Categories Recognition, (2) Fine-grained Content-level Categories Recognition and (3) Rhetorical Component Extraction. A total of 32 teams registered to participate in CERRU and 9 teams submitted evaluation results, with 7 of these teams achieving an overall score that surpassed the baseline.”
pdf
bib
CERD: A Comprehensive Chinese Rhetoric Dataset for Rhetorical Understanding and Generation in Essays
Nuowei Liu
|
Xinhao Chen
|
Hongyi Wu
|
Changzhi Sun
|
Man Lan
|
Yuanbin Wu
|
Xiaopeng Bai
|
Shaoguang Mao
|
Yan Xia
Findings of the Association for Computational Linguistics: EMNLP 2024