Xiaoliang Chen
2026
Penetrating Linguistic Disguises: A Slang-aware Label-Aligned Framework for Fine-Grained Toxicity Extraction in Chinese Hate Speech Detection
Wei Liu | Xiaoliang Chen | Duoqian Miao | Xu Gu | Xianyong Li | Yajun Du
Findings of the Association for Computational Linguistics: ACL 2026
Wei Liu | Xiaoliang Chen | Duoqian Miao | Xu Gu | Xianyong Li | Yajun Du
Findings of the Association for Computational Linguistics: ACL 2026
Flexible word boundaries and linguistic obfuscation, particularly slang, challenge precise span-level hate speech detection in Chinese. While benchmarks such as STATE ToxiCN demand the exact extraction of Target-Argument-Hateful-Group quadruples, generative Large Language Models (LLMs) often fail strict boundary constraints. In contrast, discriminative 2D Grid Tagging methods frequently encounter label collisions. To resolve these problems, this study presents a Slang-aware Label-Aligned Framework. A Structural-Semantic Lexicon Fusion (SSLF) module reduces ambiguity by mapping obscure slang to explicit hate semantics. Additionally, the proposed Label-Disentangled Volumetric Tagging (LDVT) projects token interactions into a volumetric space. LDVT uses task-specific branches and dedicated label channels to structurally mitigate feature interference. This approach removes label collisions without heuristic post-processing. Empirical outcomes on STATE ToxiCN indicate a Hard-F1 of 30.09%. This performance is 5.82% higher than the best fine-tuned LLM baseline and confirms the method is effective for exact-match extraction.
Mitigating Spurious Correlations in Text Classification Using Latent Space Geometry
Jiasen Gao | Xiaoliang Chen | Duoqian Miao | Xu Gu | Xianyong Li | Yajun Du
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jiasen Gao | Xiaoliang Chen | Duoqian Miao | Xu Gu | Xianyong Li | Yajun Du
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Spurious correlations cause deep learning models to rely on predictive shortcuts that hold in the training data but break under distribution shifts, leading to large performance drops for minority groups. Existing strategies often rely on costly group annotations or employ unstable adversarial training. In this paper, we propose Prototype-guided debiasing using Robust Invariant Feature Transformations (PRIFT), a novel framework that mitigates spurious correlations by manipulating latent space geometry. Specifically, we introduce a prototype-guided modeling approach that leverages natural language prompts to represent confounders, transforming abstract biases into interpretable geometric anchors without auxiliary classifiers. Based on these anchors, we introduce a centered projection operator that adaptively purifies representations by removing confounding deviations specific to instances while preserving essential semantic structure. Furthermore, PRIFT can handle confounding factor information at different levels, ranging from true labels to unsupervised latent inference. Experiments on four text classification benchmarks demonstrate the superiority of our method; notably, PRIFT outperforms state-of-the-art baselines and improves worst-group accuracy by over 20% on the CivilComments dataset compared to standard empirical risk minimization.
2025
DiaDP@XLLM25: Advancing Chinese Dialogue Parsing via Unified Pretrained Language Models and Biaffine Dependency Scoring
Shuoqiu Duan | Xiaoliang Chen | Duoqian Miao | Xu Gu | Xianyong Li | Yajun Du
Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)
Shuoqiu Duan | Xiaoliang Chen | Duoqian Miao | Xu Gu | Xianyong Li | Yajun Du
Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)
Dialogue-level dependency parsing is crucial for understanding complex linguistic structures in conversational data, yet progress has been hindered by limited annotated resources and inadequate modeling of dialogue dynamics. Existing methods often fail to capture both intra- and inter-utterance dependencies effectively, particularly in languages like Chinese with rich contextual interactions. To address these challenges, we propose InterParser, a novel framework that integrates a pretrained language model (PLM), bidirectional GRU (BiGRU), and biaffine scoring for comprehensive dependency parsing. Our model encodes token sequences using a PLM, refines representations via deep BiGRU layers, and employs separate projections for “head” and “dependent” roles to optimize arc and relation prediction. For cross-utterance dependencies, speaker-specific feature projections are introduced to enhance dialogue-aware scoring. Joint training minimizes cross-entropy losses for both intra- and inter-utterance dependencies, ensuring unified optimization. Experiments on a standard Chinese benchmark demonstrate that InterParser significantly outperforms prior methods, achieving state-of-the-art labeled attachment scores (LAS) for both intra- and inter-utterance parsing.