Yuanyuan Liang

2026

COCOGEC: Counterfactual Generation for Robust Grammatical Error Correction
Qianyu Wang | Xiaoman Wang | Yuanyuan Liang | Xinyuan Li | Yunshi Lan
Findings of the Association for Computational Linguistics: ACL 2026

Grammatical error correction (GEC) systems are usually trained and evaluated on GEC benchmarks, but their performance often drops sharply once the surrounding context is slightly perturbed or extended. This indicates that the existing GEC models usually fail to understand the error patterns in the varying contexts. In this paper, we thoroughly investigate the counterfactuals for GEC tasks, where the subtle changes to the contexts could lead to the label flipping issue. We address this robustness gap by viewing contextual variation through the lens of counterfactual data. We propose CoCoGEC, a counterfactual generation framework that creates copies of training instances with error-irrelevant contexts altered. Our framework systematically generates counterfactuals by (1) generating intra- and inter-sentence counterfactuals that maintain the error patterns as well as syntax of the original instances by altering the word-level and sentence-level contexts; (2) revising the generated counterfactuals by selecting the instances with flipped labels and high GEC Mutual Information (MI) coefficient. Extensive experiments show that our method substantially improves the stability of GEC models, outperforming a set of data augmentation baselines. Particularly, it could achieve absolute F_0.5 gains of +9.9, +11.3, and +20.8 points on the perturbed BEA-19*,CoNLL-14*, and TEM-8* data set.Our code is released at https://github.com/Quinnok/CoCoGEC.

pdf bib abs

ToxiTrace: Gradient-Aligned Training for Explainable Chinese Toxicity Detection
Boyang Li | Hongzhe Shou | Yuanyuan Liang | JingBin Zhang | Fang Zhou
Findings of the Association for Computational Linguistics: ACL 2026

Existing Chinese toxic content detection methods mainly target sentence-level classification but often fail to provide readable and contiguous toxic evidence spans. We propose ToxiTrace, an explainability-oriented method for BERT-style encoders with three components: (1) CuSA, which refines encoder-derived saliency cues into fine-grained toxic spans with lightweight LLM guidance; (2) GCLoss, a gradient-constrained objective that concentrates token-level saliency on toxic evidence while suppressing irrelevant activations; and (3) ARCL, which constructs sample-specific contrastive reasoning pairs to sharpen the semantic boundary between toxic and non-toxic content. Experiments show that ToxiTrace improves classification accuracy and toxic span extraction while preserving efficient encoder-based inference and producing more coherent, human-readable explanations. The core training code is available at https://github.com/ZhouF-ECNU/ToxiTrace.

2025

pdf bib abs

The strong capability of large language models (LLMs) has been applied to information extraction (IE) through either retrieval augmented prompting or instruction tuning (IT). However, the best way to incorporate information with LLMs for IE remains an open question. In this paper, we explore Retrieval Augmented Instruction Tuning (RA-IT) for IE, focusing on the task of open named entity recognition (NER). Specifically, for each training sample, we retrieve semantically similar examples from the training dataset as the context and prepend them to the input of the original instruction. To evaluate our RA-IT approach more thoroughly, we construct a Chinese IT dataset for open NER and evaluate RA-IT in both English and Chinese scenarios. Experimental results verify the effectiveness of RA-IT across various data sizes and in both English and Chinese scenarios. We also conduct thorough studies to explore the impacts of various retrieval strategies in the proposed RA-IT framework.

2023

pdf bib abs

Improving Cascade Decoding with Syntax-aware Aggregator and Contrastive Learning for Event Extraction
Zeyu Sheng | Yuanyuan Liang | Yunshi Lan
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“Cascade decoding framework has shown superior performance on event extraction tasks. How-ever, it treats a sentence as a sequence and neglects the potential benefits of the syntactic struc-ture of sentences. In this paper, we improve cascade decoding with a novel module and a self-supervised task. Specifically, we propose a syntax-aware aggregator module to model the syntaxof a sentence based on cascade decoding framework such that it captures event dependencies aswell as syntactic information. Moreover, we design a type discrimination task to learn better syn-tactic representations of different event types, which could further boost the performance of eventextraction. Experimental results on two widely used event extraction datasets demonstrate thatour method could improve the original cascade decoding framework by up to 2.2% percentagepoints of F1 score and outperform a number of competitive baseline methods. Introduction”

pdf bib abs

Prompting Large Language Models with Chain-of-Thought for Few-Shot Knowledge Base Question Generation
Yuanyuan Liang | Jianing Wang | Hanlun Zhu | Lei Wang | Weining Qian | Yunshi Lan
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

The task of Question Generation over Knowledge Bases (KBQG) aims to convert a logical form into a natural language question. For the sake of expensive cost of large-scale question annotation, the methods of KBQG under low-resource scenarios urgently need to be developed. However, current methods heavily rely on annotated data for fine-tuning, which is not well-suited for few-shot question generation. The emergence of Large Language Models (LLMs) has shown their impressive generalization ability in few-shot tasks. Inspired by Chain-of-Thought (CoT) prompting, which is an in-context learning strategy for reasoning, we formulate KBQG task as a reasoning problem, where the generation of a complete question is splitted into a series of sub-question generation. Our proposed prompting method KQG-CoT first retrieves supportive logical forms from the unlabeled data pool taking account of the characteristics of the logical form. Then, we write a prompt to explicit the reasoning chain of generating complicated questions based on the selected demonstrations. To further ensure prompt quality, we extend KQG-CoT into KQG-CoT+ via sorting the logical forms by their complexity. We conduct extensive experiments over three public KBQG datasets. The results demonstrate that our prompting method consistently outperforms other prompting baselines on the evaluated datasets. Remarkably, our KQG-CoT+ method could surpass existing few-shot SoTA results of the PathQuestions dataset by 18.25, 10.72, and 10.18 absolute points on BLEU-4, METEOR, and ROUGE-L, respectively.

Co-authors

Venues

Fix author