Hong-Gee Kim
2026
CL2GEC: A Multi-Discipline Benchmark for Continual Learning in Chinese Literature Grammatical Error Correction
Shang Qin | Jingheng Ye | Yinghui Li | Hai-Tao Zheng | Qi Li | Jinxiao Shan | Zhixing Li | Hong-Gee Kim
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shang Qin | Jingheng Ye | Yinghui Li | Hai-Tao Zheng | Qi Li | Jinxiao Shan | Zhixing Li | Hong-Gee Kim
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The growing demand for automated writing assistance in diverse academic domains highlights the need for robust Chinese Grammatical Error Correction (CGEC) systems that can adapt across disciplines. However, existing CGEC research largely lacks dedicated benchmarks for multi-disciplinary academic writing, overlooking continual learning (CL) as a promising solution to handle domain-specific linguistic variation and prevent catastrophic forgetting. To fill this crucial gap, we introduce CL2GEC, the first Continual Learning benchmark for Chinese Literature Grammatical Error Correction, designed to evaluate adaptive CGEC across multiple academic fields. Our benchmark includes 10,000 human-annotated sentences spanning 10 disciplines, each exhibiting distinct linguistic styles and error patterns. CL2GEC focuses on evaluating grammatical error correction in a continual learning setting, simulating sequential exposure to diverse academic disciplines to reflect real-world editorial dynamics. We evaluate large language models under sequential tuning, parameter-efficient adaptation, and four representative CL algorithms, using both standard GEC metrics and continual learning metrics adapted to task-level variation. Experimental results reveal that regularization-based methods mitigate forgetting more effectively than replay-based or naive sequential approaches. Our benchmark provides a rigorous foundation for future research in adaptive grammatical error correction across diverse academic domains.
GMSA: Enhancing Context Compression via Group Merging and Layer Semantic Alignment
Jiwei Tang | Zhicheng Zhang | Shunlong Wu | Jingheng Ye | Lichen Bai | Zitai Wang | Tingwei Lu | Lin Hai | Yiming Zhao | Hai-Tao Zheng | Hong-Gee Kim
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jiwei Tang | Zhicheng Zhang | Shunlong Wu | Jingheng Ye | Lichen Bai | Zitai Wang | Tingwei Lu | Lin Hai | Yiming Zhao | Hai-Tao Zheng | Hong-Gee Kim
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have achieved remarkable performance across a wide range of Natural Language Processing (NLP) tasks. However, in long-context scenarios, they face two challenges: high computational cost and information redundancy. To address these challenges, we propose GMSA, an encoder-decoder context compression framework that generates a compact sequence of soft tokens for downstream tasks. GMSA introduces Group Merging to achieve more uniform aggregation, mitigating semantic dominance during autoencoder pretraining, and Layer Semantic Alignment (LSA) to bridge the semantic gap between high-level abstract semantics and low-level input semantics. We first pretrain GMSA as an autoencoder and then fine-tune it for downstream tasks. Experiments demonstrate that GMSA improves context reconstruction compared to existing soft prompt compression paradigm and outperforms baselines on multiple long-context question answering and summarization benchmarks across two backbone models, while maintaining low end-to-end latency.
2025
CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction
Jingheng Ye | Zishan Xu | Yinghui Li | Linlin Song | Qingyu Zhou | Hai-Tao Zheng | Ying Shen | Wenhao Jiang | Hong-Gee Kim | Ruitong Liu | Xin Su | Zifei Shan
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jingheng Ye | Zishan Xu | Yinghui Li | Linlin Song | Qingyu Zhou | Hai-Tao Zheng | Ying Shen | Wenhao Jiang | Hong-Gee Kim | Ruitong Liu | Xin Su | Zifei Shan
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The paper focuses on the interpretability of Grammatical Error Correction (GEC) evaluation metrics, which received little attention in previous studies. To bridge the gap, we introduce **CLEME2.0**, a reference-based metric describing four fundamental aspects of GEC systems: hit-correction, wrong-correction, under-correction, and over-correction. They collectively contribute to exposing critical qualities and locating drawbacks of GEC systems. Evaluating systems by combining these aspects also leads to superior human consistency over other reference-based and reference-less metrics. Extensive experiments on two human judgment datasets and six reference datasets demonstrate the effectiveness and robustness of our method, achieving a new state-of-the-art result. Our codes are released at https://github.com/THUKElab/CLEME.
RAISE: Reinforced Adaptive Instruction Selection For Large Language Models
Qingsong Lv | Yangning Li | Zihua Lan | Zishan Xu | Jiwei Tang | Tingwei Lu | Yinghui Li | Wenhao Jiang | Hong-Gee Kim | Hai-Tao Zheng | Philip S. Yu
Findings of the Association for Computational Linguistics: EMNLP 2025
Qingsong Lv | Yangning Li | Zihua Lan | Zishan Xu | Jiwei Tang | Tingwei Lu | Yinghui Li | Wenhao Jiang | Hong-Gee Kim | Hai-Tao Zheng | Philip S. Yu
Findings of the Association for Computational Linguistics: EMNLP 2025
Instruction tuning of large language models (LLMs) benefits more from a handful of high-quality examples than from hordes of low-quality ones. Existing selection methods typically rely on static, heuristic quality scores and are executed only once before training. Consequently, they neither adapt to the changing state of the model nor target downstream objectives, leaving substantial room for optimization. We propose RAISE (**R**einforced **A**daptive **I**nstruction **SE**lection), a *dynamic*, *task-driven* framework that integrates selection into every training step. At each step, RAISE estimates the expected contribution of each candidate instruction to task performance and admits only the most helpful. By modeling this process as sequential decision making, we optimize the selector with reinforcement learning, yielding an interpretable policy specialized for the target task. Extensive experiments show that RAISE reaches comparable or better results than full-data training while updating only 1% of the steps, demonstrating both high efficacy and significant computational savings.
2024
Depth Aware Hierarchical Replay Continual Learning for Knowledge Based Question Answering
Zhixiong Cao | Hai-Tao Zheng | Yangning Li | Jin Xu | Rongsheng Li | Hong-Gee Kim
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Zhixiong Cao | Hai-Tao Zheng | Yangning Li | Jin Xu | Rongsheng Li | Hong-Gee Kim
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Continual learning is an emerging area of machine learning that deals with the issue where models adapt well to the latest data but lose the ability to remember past data due to changes in the data source. A widely adopted solution is by keeping a small memory of previous learned data that use replay. Most of the previous studies on continual learning focused on classification tasks, such as image classification and text classification, where the model needs only to categorize the input data. Inspired by the human ability to incrementally learn knowledge and solve different problems using learned knowledge, we considered a more pratical scenario, knowledge based quesiton answering about continual learning. In this scenario, each single question is different from others(means different fact trippes to answer them) while classification tasks only need to find feature boundaries of different categories, which are the curves or surfaces that separate different categories in the feature space. To address this issue, we proposed a depth aware hierarchical replay framework which include a tree structure classfier to have a sense of knowledge distribution and fill the gap between text classfication tasks and question-answering tasks for continual learning, a local sampler to grasp these critical samples and a depth aware learning network to reconstructe the feature space of a single learning round. In our experiments, we have demonstrated that our proposed model outperforms previous continual learning methods in mitigating the issue of catastrophic forgetting.
2022
Prompt-learning for Fine-grained Entity Typing
Ning Ding | Yulin Chen | Xu Han | Guangwei Xu | Xiaobin Wang | Pengjun Xie | Haitao Zheng | Zhiyuan Liu | Juanzi Li | Hong-Gee Kim
Findings of the Association for Computational Linguistics: EMNLP 2022
Ning Ding | Yulin Chen | Xu Han | Guangwei Xu | Xiaobin Wang | Pengjun Xie | Haitao Zheng | Zhiyuan Liu | Juanzi Li | Hong-Gee Kim
Findings of the Association for Computational Linguistics: EMNLP 2022
As an effective approach to adapting pre-trained language models (PLMs) for specific tasks, prompt-learning has recently attracted much attention from researchers. By using cloze-style language prompts to stimulate the versatile knowledge of PLMs, prompt-learning can achieve promising results on a series of NLP tasks, such as natural language inference, sentiment classification, and knowledge probing. In this work, we investigate the application of prompt-learning on fine-grained entity typing in fully supervised, few-shot, and zero-shot scenarios. We first develop a simple and effective prompt-learning pipeline by constructing entity-oriented verbalizers and templates and conducting masked language modeling. Further, to tackle the zero-shot regime, we propose a self-supervised strategy that carries out distribution-level optimization in prompt-learning to automatically summarize the information of entity types. Extensive experiments on four fine-grained entity typing benchmarks under fully supervised, few-shot, and zero-shot settings show the effectiveness of the prompt-learning paradigm and further make a powerful alternative to vanilla fine-tuning.
Search
Fix author
Co-authors
- Hai-Tao Zheng 6
- Yinghui Li 3
- Jingheng Ye 3
- Wenhao Jiang 2
- Yangning Li 2
- Tingwei Lu 2
- Jiwei Tang 2
- Zishan Xu 2
- Lichen Bai 1
- Zhixiong Cao 1
- Yulin Chen 1
- Ning Ding 1
- Lin Hai 1
- Xu Han 1
- Zihua Lan 1
- Rongsheng Li 1
- Juanzi Li 1
- Qi Li 1
- Zhixing Li 1
- Zhiyuan Liu 1
- Ruitong Liu 1
- Qingsong Lv 1
- Shang Qin 1
- Jinxiao Shan 1
- Zifei Shan 1
- Ying Shen 1
- Linlin Song 1
- Xin Su 1
- Xiaobin Wang 1
- Zitai Wang 1
- Shunlong Wu 1
- Pengjun Xie 1
- Jin Xu 1
- Guangwei Xu 1
- Philip S. Yu 1
- Zhicheng Zhang 1
- Yiming Zhao 1
- Qingyu Zhou 1