Xiangdong Su - ACL Anthology

This page is part of a temporary preview of a proposed change that may be incomplete or contain mistakes. It is not official and will be removed when the change is merged or abandoned.

Xiangdong Su

2026

Who Wrote This Line? Evaluating the Detection of LLM-Generated Classical Chinese Poetry
Jiang Li | Tian Lan | Shanshan Wang | Zdongxing | Dianqing Lin | Guanglai Gao | Derek F. Wong | Xiangdong Su
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The rapid development of large language models (LLMs) has extended text generation tasks into the literary domain. However, AI-generated literary creations has raised increasingly prominent issues of creative authenticity and ethics in literary world, making the detection of LLM-generated literary texts essential and urgent. While previous works have made significant progress in detecting AI-generated text, it has yet to address classical Chinese poetry. Due to the unique linguistic features of classical Chinese poetry, such as strict metrical regularity, a shared system of poetic imagery, and flexible syntax, distinguishing whether a poem is authored by AI presents a substantial challenge. To address these issues, we introduce ChangAn, a benchmark for detecting LLM-generated classical Chinese poetry that containing total 30,664 poems, 10,276 are human-written poems and 20,388 poems are generated by four popular LLMs. Based on ChangAn, we conducted a systematic evaluation of 12 AI detectors, investigating their performance variations across different text granularities and generation strategies. Our findings highlight the limitations of current Chinese text detectors, which fail to serve as reliable tools for detecting LLM-generated classical Chinese poetry. These results validate the effectiveness and necessity of our proposed ChangAn benchmark. Our dataset and code are available at https://github.com/VelikayaScarlet/ChangAn.

CEDAR: A Chinese Evaluation Dataset for Computational Argumentation
Tian Lan | Jiang Li | Rong Yan | Feilong Bao | Weihua Wang | Guanglai Gao | Xiangdong Su
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Computational argumentation has received increasing attention in recent years. However, existing debate datasets neglect some important labels for argument mining, generation, and evaluation. Meanwhile, the lack of comprehensively annotated Chinese oral debate datasets hinders progress in this field. To address these gaps, we introduce a comprehensive Chinese Evaluation Dataset for Computational Argumentation, named CEDAR. Compared to previous datasets, CEDAR includes the essential labels of computational argumentation (claim, stance, evidence) and five additional crucial labels: rhetorical figures, debater roles, modal words, utterance time, and debate results. Moreover, it offers complete transcripts of each debate, including speeches from the Pro and Con sides. Thus, the proposed CEDAR not only supports common argument mining and generation tasks, but also provides resources for rhetorical figure detection, argument quality evaluation, and debate result prediction. This dataset covers 600 debates about 318 topics from Chinese debate competitions. Besides providing a dataset for research, we conduct experiments on common computational argument tasks and a novel task (rhetorical figure detection), in which we also evaluate LLMs. The experimental results highlight the challenging nature of the dataset. Our corpus is available at https://github.com/VelikayaScarlet/CEDAR.

Know Your Place: Diagnosing Implicit Social Adaptation Failures in Chinese Large Language Models
Yu Tian | Jie Xing | Ziming Li | Jiang Li | Zehua Duo | Tian Lan | Xu Liu | Guanglai Gao | Xiangdong Su
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

As large language models (LLMs) are increasingly deployed in dialogue systems and interactive agents, their social adaptation during natural interaction has drawn growing attention. While prior work shows strong social regulation under explicit role or style instructions, it remains unclear whether LLMs can spontaneously perceive and respond to implicit social differences without explicit prompts. Focusing on high-context Chinese interactions, we identify a robust phenomenon termed Social Agnosia, where LLMs fail to adequately perceive and accommodate implicit social power, affective arousal, and epistemic status during natural interaction. To diagnose this behavior, we propose C-ISA, a framework grounded in Communication Accommodation Theory that decomposes social adaptation into three approximately orthogonal dimensions, and conduct controlled comparisons across multiple Chinese LLMs under implicit and explicit conditions. Results show that while models substantially adjust linguistic strategies under explicit conditioning, they exhibit socially insensitive and homogenized responses in natural interaction, revealing a structural gap between spontaneous behavior and conditioned capability. The C-ISA dataset is publicly available at https://github.com/ty373/C-ISA.

2025

Mitigating Heterogeneity among Factor Tensors via Lie Group Manifolds for Tensor Decomposition Based Temporal Knowledge Graph Embedding
Jiang Li | Xiangdong Su | Guanglai Gao
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Recent studies have highlighted the effectiveness of tensor decomposition methods in the Temporal Knowledge Graphs Embedding (TKGE) task. However, we found that inherent heterogeneity among factor tensors in tensor decomposition significantly hinders the tensor fusion process and further limits the performance of link prediction. To overcome this limitation, we introduce a novel method that maps factor tensors onto a unified smooth Lie group manifold to make the distribution of factor tensors approximating homogeneous in tensor decomposition. We provide the theoretical proof of our motivation that homogeneous tensors are more effective than heterogeneous tensors in tensor fusion and approximating the target for tensor decomposition based TKGE methods. The proposed method can be directly integrated into existing tensor decomposition based TKGE methods without introducing extra parameters. Extensive experiments demonstrate the effectiveness of our method in mitigating the heterogeneity and in enhancing the tensor decomposition based TKGE models.

Leveraging 3D Gaussian for Temporal Knowledge Graph Embedding
Jiang Li | Xiangdong Su | Guanglai Gao
Findings of the Association for Computational Linguistics: EMNLP 2025

Representation learning in knowledge graphs (KGs) has predominantly focused on static data, yet many real-world knowledge graphs are inherently dynamic. For instance, the fact (The CEO of Apple, holds position, Steve Jobs) was valid until 2011, after which it changed, emphasizing the need to incorporate temporal information into knowledge representation. In this paper, we propose 3DG-TE, a novel temporal KG embedding method inspired by 3D Gaussian Splatting, where entities, relations, and timestamps are modeled as 3D Gaussian distributions with learnable structured covariance. This approach optimizes the Gaussian distributions of entities, relations, and timestamps to improve the overall KG representation. To effectively capture temporal-relational interactions, we design structured covariances that form composite transformation operators: relations induce rotational transformations, while timestamps regulate adaptive scaling. We also design a compound scoring function that integrates mean positions and structured covariance, preserving geometric interpretability. Experimental results on three benchmark TKG datasets demonstrate that 3DG-TE outperforms state-of-the-art baselines in temporal link prediction tasks. Theoretical analysis further confirms our model’s ability to capture key relation patterns.

McBE: A Multi-task Chinese Bias Evaluation Benchmark for Large Language Models
Tian Lan | Xiangdong Su | Xu Liu | Ruirui Wang | Ke Chang | Jiang Li | Guanglai Gao
Findings of the Association for Computational Linguistics: ACL 2025

As large language models (LLMs) are increasingly applied to various NLP tasks, their inherent biases are gradually disclosed. Therefore, measuring biases in LLMs is crucial to mitigate its ethical risks. However, most existing bias evaluation datasets are focus on English andNorth American culture, and their bias categories are not fully applicable to other cultures. The datasets grounded in the Chinese language and culture are scarce. More importantly, these datasets usually only support single evaluation task and cannot evaluate the bias from multiple aspects in LLMs. To address these issues, we present a Multi-task Chinese Bias Evaluation Benchmark (McBE) that includes 4,077 bias evaluation instances, covering 12 single bias categories, 82 subcategories and introducing 5 evaluation tasks, providing extensive category coverage, content diversity, and measuring comprehensiveness. Additionally, we evaluate several popular LLMs from different series and with parameter sizes. In general, all these LLMs demonstrated varying degrees of bias. We conduct an in-depth analysis of results, offering novel insights into bias in LLMs.

F²Bench: An Open-ended Fairness Evaluation Benchmark for LLMs with Factuality Considerations
Tian Lan | Jiang Li | Yemin Wang | Xu Liu | Xiangdong Su | Guanglai Gao
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

With the growing adoption of large language models (LLMs) in NLP tasks, concerns about their fairness have intensified. Yet, most existing fairness benchmarks rely on closed-ended evaluation formats, which diverge from real-world open-ended interactions. These formats are prone to position bias and introduce a “minimum score” effect, where models can earn partial credit simply by guessing. Moreover, such benchmarks often overlook factuality considerations rooted in historical, social, physiological, and cultural contexts, and rarely account for intersectional biases. To address these limitations, we propose F²Bench: an open-ended fairness evaluation benchmark for LLMs that explicitly incorporates factuality considerations. F²Bench comprises 2,568 instances across 10 demographic groups and two open-ended tasks. By integrating text generation, multi-turn reasoning, and factual grounding, F²Bench aims to more accurately reflect the complexities of real-world model usage. We conduct a comprehensive evaluation of several LLMs across different series and parameter sizes. Our results reveal that all models exhibit varying degrees of fairness issues. We further compare open-ended and closed-ended evaluations, analyze model-specific disparities, and provide actionable recommendations for future model development. Our code and dataset are publicly available at https://github.com/VelikayaScarlet/F2Bench.

C3LRSO: A Chinese Corpus for Complex Logical Reasoning in Sentence Ordering
Xiaotao Guo | Jiang Li | Xiangdong Su | Fujun Zhang
Proceedings of the 31st International Conference on Computational Linguistics

Sentence ordering is the task of rearranging a set of unordered sentences into a coherent and logically consistent sequence. Recent work has primarily used pre-trained language models, achieving significant success in the task. However, existing sentence ordering corpora are predominantly in English, and comprehensive benchmark datasets for non-English languages are unavailable. Meanwhile, current datasets often insert specific markers into paragraphs, inadvertently making the logical sequence between sentences more apparent and reducing the models’ ability to handle genuinely unordered sentences in real applications. To address these limitations, we develop C3LRSO, a high-quality Chinese sentence ordering dataset that overcomes the aforementioned shortcomings by providing genuinely unordered sentences without artificial segmentation cues. Furthermore, given the outstanding performance of large language models on NLP tasks, we evaluate these models on our dataset for this task. Additionally, we propose a simple yet effective parameter-free approach that outperforms existing methods on this task. Experiments demonstrate the challenging nature of the dataset and the strong performance of our proposed method. These findings highlight the potential for further research in sentence ordering and the development of more robust language models. Our dataset is freely available at https://github.com/JasonGuo1/C3LRSO.

A Mutual Information Perspective on Knowledge Graph Embedding
Jiang Li | Xiangdong Su | Zehua Duo | Tian Lan | Xiaotao Guo | Guanglai Gao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Knowledge graph embedding techniques have emerged as a critical approach for addressing the issue of missing relations in knowledge graphs. However, existing methods often suffer from limitations, including high intra-group similarity, loss of semantic information, and insufficient inference capability, particularly in complex relation patterns such as 1-N and N-1 relations. To address these challenges, we introduce a novel KGE framework that leverages mutual information maximization to improve the semantic representation of entities and relations. By maximizing the mutual information between different components of triples, such as (h, r) and t, or (r, t) and h, the proposed method improves the model’s ability to preserve semantic dependencies while maintaining the relational structure of the knowledge graph. Extensive experiments on benchmark datasets demonstrate the effectiveness of our approach, with consistent performance improvements across various baseline models. Additionally, visualization analyses and case studies demonstrate the improved ability of the MI framework to capture complex relation patterns.

2024

Ensuring Safe and High-Quality Outputs: A Guideline Library Approach for Language Models
Yi Luo | Zhenghao Lin | YuHao Zhang | Jiashuo Sun | Chen Lin | Chengjin Xu | Xiangdong Su | Yelong Shen | Jian Guo | Yeyun Gong
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Large Language Models (LLMs) exhibit impressive capabilities but also present risks such as biased content generation and privacy issues. One of the current alignment techniques includes principle-driven integration, but it faces challenges arising from the imprecision of manually crafted rules and inadequate risk perception in models without safety training. To address these, we introduce Guide-Align, a two-stage approach. Initially, a safety-trained model identifies potential risks and formulates specific guidelines for various inputs, establishing a comprehensive library of guidelines and a model for input-guidelines retrieval. Subsequently, the retrieval model correlates new inputs with relevant guidelines, which guide LLMs in response generation to ensure safe and high-quality outputs, thereby aligning with human values. An additional optional stage involves fine-tuning a model with well-aligned datasets generated through the process implemented in the second stage.Our method customizes guidelines to accommodate diverse inputs, thereby enhancing the fine-grainedness and comprehensiveness of the guideline library. Furthermore, it incorporates safety expertise from a safety-trained LLM through a lightweight retrieval model.We evaluate our approach on three benchmarks, demonstrating significant improvements in LLM security and quality. Notably, our fine-tuned model, Labrador, even at 13 billion parameters, outperforms GPT-3.5-turbo and surpasses GPT-4 in alignment capabilities.

Hyperbolic Representations for Prompt Learning
Nan Chen | Xiangdong Su | Feilong Bao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Continuous prompt tuning has gained significant attention for its ability to train only continuous prompts while freezing the language model. This approach greatly reduces the training time and storage for downstream tasks. In this work, we delve into the hierarchical relationship between the prompts and downstream text inputs. In prompt learning, the prefix prompt acts as a module to guide the downstream language model, establishing a hierarchical relationship between the prefix prompt and subsequent inputs. Furthermore, we explore the benefits of leveraging hyperbolic space for modeling hierarchical structures. We project representations of pre-trained models from Euclidean space into hyperbolic space using the Poincaré disk which effectively captures the hierarchical relationship between the prompt and input text. The experiments on natural language understanding (NLU) tasks illustrate that hyperbolic space can model the hierarchical relationship between prompt and text input. We release our code at https://github.com/myaxxxxx/Hyperbolic-Prompt-Learning.

Exploring the Synergy of Dual-path Encoder and Alignment Module for Better Graph-to-Text Generation
Tianxin Zhao | Yingxin Liu | Xiangdong Su | Jiang Li | Guanglai Gao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The mainstream approaches view the knowledge graph-to-text (KG-to-text) generation as a sequence-to-sequence task and fine-tune the pre-trained model (PLM) to generate the target text from the linearized knowledge graph. However, the linearization of knowledge graphs and the structure of PLMs lead to the loss of a large amount of graph structure information. Moreover, PLMs lack an explicit graph-text alignment strategy because of the discrepancy between structural and textual information. To solve these two problems, we propose a synergetic KG-to-text model with a dual-path encoder, an alignment module, and a guidance module. The dual-path encoder consists of a graph structure encoder and a text encoder, which can better encode the structure and text information of the knowledge graph. The alignment module contains a two-layer Transformer block and an MLP block, which aligns and integrates the information from the dual encoder. The guidance module combines an improved pointer network and an MLP block to avoid error-generated entities and ensures the fluency and accuracy of the generated text. Our approach obtains very competitive performance on three benchmark datasets. Our code is available from https://github.com/IMu-MachineLearningsxD/G2T.

EpLSA: Synergy of Expert-prefix Mixtures and Task-Oriented Latent Space Adaptation for Diverse Generative Reasoning
Fujun Zhang | Xiangdong Su | Jiang Li | Rong Yan | Guanglai Gao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Existing models for diverse generative reasoning still struggle to generate multiple unique and plausible results. Through an in-depth examination, we argue that it is critical to leverage a mixture of experts as prefixes to enhance the diversity of generated results and make task-oriented adaptation in the latent space of the generation models to improve the quality of the responses. At this point, we propose EpLSA, an innovative model based on the synergy of expert-prefix mixtures and task-oriented latent space adaptation for diverse generative reasoning. Specifically, we use expert-prefixes mixtures to encourage the model to create multiple responses with different semantics and design a loss function to address the problem that the semantics is interfered by the expert-prefixes. Meanwhile, we design a task-oriented adaptation block to make the pre-trained encoder within the generation model more effectively adapted to the pre-trained decoder in the latent space, thus further improving the quality of the generated text. Extensive experiments on three different types of generative reasoning tasks demonstrate that EpLSA outperforms existing baseline models in terms of both the quality and diversity of the generated outputs. Our code is publicly available at https://github.com/IMU-MachineLearningSXD/EpLSA.

TransERR: Translation-based Knowledge Graph Embedding via Efficient Relation Rotation
Jiang Li | Xiangdong Su | Fujun Zhang | Guanglai Gao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents a translation-based knowledge geraph embedding method via efficient relation rotation (TransERR), a straightforward yet effective alternative to traditional translation-based knowledge graph embedding models. Different from the previous translation-based models, TransERR encodes knowledge graphs in the hypercomplex-valued space, thus enabling it to possess a higher degree of translation freedom in mining latent information between the head and tail entities. To further minimize the translation distance, TransERR adaptively rotates the head entity and the tail entity with their corresponding unit quaternions, which are learnable in model training. We also provide mathematical proofs to demonstrate the ability of TransERR in modeling various relation patterns, including symmetry, antisymmetry, inversion, composition, and subrelation patterns. The experiments on 10 benchmark datasets validate the effectiveness and the generalization of TransERR. The results also indicate that TransERR can better encode large-scale datasets with fewer parameters than the previous translation-based models. Our code and datasets are available at https://github.com/dellixx/TransERR.

APOLLO: An Optimized Training Approach for Long-form Numerical Reasoning
Jiashuo Sun | Hang Zhang | Chen Lin | Xiangdong Su | Yeyun Gong | Jian Guo
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Long-form numerical reasoning aims to generate a reasoning program to calculate the answer for a given question. Previous work followed a retriever-generator framework, where the retriever selects key facts from a long-form document, and the generator generates a reasoning program based on the retrieved facts. However, they treated all facts equally without considering the different contributions of facts with and without numerical information. Furthermore, they ignored program consistency, leading to the wrong punishment of programs that differed from the ground truth. In order to address these issues, we proposed APOLLO (An optimized training aPproach fOr Long-form numericaL reasOning), to improve long-form numerical reasoning. APOLLO includes a number-aware negative sampling strategy for the retriever to discriminate key numerical facts, and a consistency-based reinforcement learning with target program augmentation for the generator to ultimately increase the execution accuracy. Experimental results on the FinQA and ConvFinQA leaderboards verify the effectiveness of our proposed methods, achieving the new state-of-the-art.

Learning Low-dimensional Multi-domain Knowledge Graph Embedding via Dual Archimedean Spirals
Jiang Li | Xiangdong Su | Fujun Zhang | Guanglai Gao
Findings of the Association for Computational Linguistics: ACL 2024

Knowledge graph embedding (KGE) is extensively employed for link prediction by representing entities and relations as low-dimensional vectors. In real-world scenarios, knowledge graphs (KGs) usually encompass diverse domains, which poses challenges to KG representations. However, existing KGE methods rarely make domain constraints on the embedding distribution of multi-domain KGs, leading to the embedding overlapping of different domains and performance degradation of link prediction. To address this challenge, we propose Dual Archimedean Spiral Knowledge Graph Embedding (DuASE), a low-dimensional KGE model for multi-domain KGs. DuASE is inspired by our discovery that relation types can distinguish entities from different domains. Specifically, DuASE encodes entities with the same relation on the same Archimedean spiral, allowing it to differentiate the entities from different domains. To avoid embedding overlapping across domains, DuASE further makes the head and the tail spirals in the same triplet cluster to their respective domain space by a regularization function. Thus, DuASE can better capture the domain information and the dependencies between entities when modeling the multi-domain KGs, leading to improved KG representations. We validate the effectiveness of DuASE on the novel multi-domain dataset (n-MDKG) introduced in this study and three other benchmark datasets.

2023

How Well Apply Simple MLP to Incomplete Utterance Rewriting?
Jiang Li | Xiangdong Su | Xinlan Ma | Guanglai Gao
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Incomplete utterance rewriting (IUR) aims to restore the incomplete utterance with sufficient context information for comprehension. This paper introduces a simple yet efficient IUR method. Different from prior studies, we first employ only one-layer MLP architecture to mine latent semantic information between joint utterances for IUR task (MIUR). After that, we conduct a joint feature matrix to predict the token type and thus restore the incomplete utterance. The well-designed network and simple architecture make our method significantly superior to existing methods in terms of quality and inference speedOur code is available at https://github.com/IMU-MachineLearningSXD/MIUR.

TeAST: Temporal Knowledge Graph Embedding via Archimedean Spiral Timeline
Jiang Li | Xiangdong Su | Guanglai Gao
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Temporal knowledge graph embedding (TKGE) models are commonly utilized to infer the missing facts and facilitate reasoning and decision-making in temporal knowledge graph based systems. However, existing methods fuse temporal information into entities, potentially leading to the evolution of entity information and limiting the link prediction performance of TKG. Meanwhile, current TKGE models often lack the ability to simultaneously model important relation patterns and provide interpretability, which hinders their effectiveness and potential applications. To address these limitations, we propose a novel TKGE model which encodes Temporal knowledge graph embeddings via Archimedean Spiral Timeline (TeAST), which maps relations onto the corresponding Archimedean spiral timeline and transforms the quadruples completion to 3th-order tensor completion problem. Specifically, the Archimedean spiral timeline ensures that relations that occur simultaneously are placed on the same timeline, and all relations evolve over time. Meanwhile, we present a novel temporal spiral regularizer to make the spiral timeline orderly. In addition, we provide mathematical proofs to demonstrate the ability of TeAST to encode various relation patterns. Experimental results show that our proposed model significantly outperforms existing TKGE methods. Our code is available at https://github.com/IMU-MachineLearningSXD/TeAST.

2020

Incorporating Inner-word and Out-word Features for Mongolian Morphological Segmentation
Na Liu | Xiangdong Su | Haoran Zhang | Guanglai Gao | Feilong Bao
Proceedings of the 28th International Conference on Computational Linguistics

Mongolian morphological segmentation is regarded as a crucial preprocessing step in many Mongolian related NLP applications and has received extensive attention. Recently, end-to-end segmentation approaches with long short-term memory networks (LSTM) have achieved excellent results. However, the inner-word features among characters in the word and the out-word features from context are not well utilized in the segmentation process. In this paper, we propose a neural network incorporating inner-word and out-word features for Mongolian morphological segmentation. The network consists of two encoders and one decoder. The inner-word encoder uses the self-attention mechanisms to capture the inner-word features of the target word. The out-word encoder employs a two layers BiLSTM network to extract out-word features in the sentence. Then, the decoder adopts a multi-head double attention layer to fuse the inner-word features and out-word features and produces the segmentation result. The evaluation experiment compares the proposed network with the baselines and explores the effectiveness of the sub-modules.

Co-authors

Rong Yan (闫蓉) 2

Shanshan Wang 1

Derek F. Wong (黄辉) 1

Venues