Ziyang Chen
2026
SemPA: Improving Sentence Embeddings of Large Language Models through Semantic Preference Alignment
Ziyang Chen | Zhenxuan Huang | Yile Wang | Weiqin Wang | Lu Yin | Hui Huang
Findings of the Association for Computational Linguistics: ACL 2026
Ziyang Chen | Zhenxuan Huang | Yile Wang | Weiqin Wang | Lu Yin | Hui Huang
Findings of the Association for Computational Linguistics: ACL 2026
Traditional sentence embedding methods employ token-level contrastive learning on non-generative pre-trained models. Recently, there have emerged embedding methods based on generative large language models (LLMs). These methods either rely on fixed prompt templates or involve modifications to the model architecture. The former lacks further optimization of the model and results in limited performance, while the latter alters the internal computational mechanisms of the model, thereby compromising its generative capabilities. We propose SemPA, a novel approach that boosts the sentence representations while preserving the generative ability of LLMs via semantic preference alignment. We leverage sentence-level Direct Preference Optimization (DPO) to efficiently optimize LLMs on a paraphrase generation task, where the model learns to discriminate semantically equivalent sentences while preserving inherent generative capacity. Theoretically, we establish a formal connection between DPO and contrastive learning under the Plackett-Luce model framework. Empirically, experimental results on both semantic textual similarity tasks and various benchmarks for LLMs show that SemPA achieves better semantic representations without sacrificing the inherent generation capability of LLMs.
Social Welfare Function Leaderboard: On the Emergence of LLM Agents as the Welfare Dictator
Zhengliang Shi | Ruotian Ma | Jen-tse Huang | Xinbei Ma | Xingyu Chen | Mengru Wang | Qu Yang | Yue Wang | Fanghua Ye | Ziyang Chen | Shanyi Wang | Cixing LI | Wenxuan Wang | Zhaopeng Tu | Xiaolong Li | Zhaochun Ren | Liefeng Bo
Findings of the Association for Computational Linguistics: ACL 2026
Zhengliang Shi | Ruotian Ma | Jen-tse Huang | Xinbei Ma | Xingyu Chen | Mengru Wang | Qu Yang | Yue Wang | Fanghua Ye | Ziyang Chen | Shanyi Wang | Cixing LI | Wenxuan Wang | Zhaopeng Tu | Xiaolong Li | Zhaochun Ren | Liefeng Bo
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs) are increasingly entrusted with high-stakes decisions that affect human welfare. However, the principles and values that guide these models when distributing scarce societal resources remain largely unexamined. To address this, we introduce the Social Welfare Function (SWF) Benchmark, a dynamic simulation environment in which an LLM acts as a dictator, distributing tasks to heterogeneous recipients with different returns on investment (ROI). The benchmark is designed to create a dilemma between maximizing collective efficiency (i.e., overall ROI) and ensuring distributive fairness (measured by the Gini coefficient). We evaluate 20 state-of-the-art LLMs. Our findings reveal several key insights, including: (i) LLMs’ general ability, as measured by popular Arena leaderboards, misaligns with their allocation skills; (ii) Most LLMs exhibit a strong default utilitarian orientation, prioritizing overall productivity at the expense of inequality. (iii) Allocation behaviors are highly manipulated, easily perturbed by common persuasion strategies. These results highlight the risks of deploying current LLMs as societal decision-makers and underscore the need for specialized benchmarks and alignment for AI governance.
2025
ToolExpNet: Optimizing Multi-Tool Selection in LLMs with Similarity and Dependency-Aware Experience Networks
Zijing Zhang | Zhanpeng Chen | He Zhu | Ziyang Chen | Nan Du | Xiaolong Li
Findings of the Association for Computational Linguistics: ACL 2025
Zijing Zhang | Zhanpeng Chen | He Zhu | Ziyang Chen | Nan Du | Xiaolong Li
Findings of the Association for Computational Linguistics: ACL 2025
Tool learning enhances Large Language Models’ (LLMs) dynamic interaction with external tools, improving their ability to solve complex problems. However, current empirical methods, which primarily focus on isolated tools learning, still struggle with accurate multi-tool selection due to issues like confusing similar tools and neglecting dependencies. To address these challenges, we propose the Tool Experience Network (ToolExpNet), which integrates tools and trial-and-error experiences into a network characterized by semantic similarity and dependency relationships. ToolExpNet iteratively conducts simulated experiments using adaptive sampling to explore subtle differences and connections between tools, and summarizes these experiences to provide insightful guidance for LLM tool selection. Our experiments demonstrate that learning the relationships between tools helps achieve more comprehensive tool learning. Evaluations on multiple real-world API datasets show that ToolExpNet effectively addresses common challenges in multi-tool selection, significantly outperforming existing baselines across different foundation LLMs.
Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding
Zhanpeng Chen | Mingxiao Li | Ziyang Chen | Nan Du | Xiaolong Li | Yuexian Zou
Findings of the Association for Computational Linguistics: ACL 2025
Zhanpeng Chen | Mingxiao Li | Ziyang Chen | Nan Du | Xiaolong Li | Yuexian Zou
Findings of the Association for Computational Linguistics: ACL 2025
Vision-language Models (VLMs) have shown remarkable capabilities in advancing general artificial intelligence, yet the irrational encoding of visual positions persists in inhibiting the models’ comprehensive perception performance across different levels of granularity. In this work, we propose Pyramid-descent Visual Position Encoding (PyPE), a novel approach designed to enhance the perception of visual tokens within VLMs. By assigning visual position indexes from the periphery to the center and expanding the central receptive field incrementally, PyPE addresses the limitations of traditional raster-scan methods and mitigates the long-term decay effects induced by Rotary Position Embedding (RoPE). Our method reduces the relative distance between interrelated visual elements and instruction tokens, promoting a more rational allocation of attention weights and allowing for a multi-granularity perception of visual elements and countering the over-reliance on anchor tokens. Extensive experimental evaluations demonstrate that PyPE consistently improves the general capabilities of VLMs across various sizes. Code is available at https://anonymous.4open.science/r/PyPE-34EE.
2024
Temporal Knowledge Question Answering via Abstract Reasoning Induction
Ziyang Chen | Dongfang Li | Xiang Zhao | Baotian Hu | Min Zhang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ziyang Chen | Dongfang Li | Xiang Zhao | Baotian Hu | Min Zhang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In this study, we address the challenge of enhancing temporal knowledge reasoning in Large Language Models (LLMs). LLMs often struggle with this task, leading to the generation of inaccurate or misleading responses. This issue mainly arises from their limited ability to handle evolving factual knowledge and complex temporal logic. To overcome these limitations, we propose Abstract Reasoning Induction (ARI) framework, which divides temporal reasoning into two distinct phases: Knowledge agnostic and Knowledge-based. This framework offers factual knowledge support to LLMs while minimizing the incorporation of extraneous noisy data. Concurrently, informed by the principles of constructivism, ARI provides LLMs the capability to engage in proactive, self-directed learning from both correct and incorrect historical reasoning samples. By teaching LLMs to actively construct knowledge and methods, it can significantly boosting their temporal reasoning abilities. Our approach achieves significant improvements, with relative gains of 29.7% and 9.27% on two temporal QA datasets, underscoring its efficacy in advancing temporal reasoning in LLMs. The code can be found at https: //github.com/czy1999/ARI-QA.
2023
Large Language Models Meet Harry Potter: A Dataset for Aligning Dialogue Agents with Characters
Nuo Chen | Yan Wang | Haiyun Jiang | Deng Cai | Yuhan Li | Ziyang Chen | Longyue Wang | Jia Li
Findings of the Association for Computational Linguistics: EMNLP 2023
Nuo Chen | Yan Wang | Haiyun Jiang | Deng Cai | Yuhan Li | Ziyang Chen | Longyue Wang | Jia Li
Findings of the Association for Computational Linguistics: EMNLP 2023
In recent years, Dialogue-style Large Language Models (LLMs) such as ChatGPT and GPT4 have demonstrated immense potential in constructing open-domain dialogue agents. However, aligning these agents with specific characters or individuals remains a considerable challenge due to the complexities of character representation and the lack of comprehensive annotations. In this paper, we introduce the Harry Potter Dialogue (HPD) dataset, designed to advance the study of dialogue agents and character alignment. The dataset encompasses all dialogue sessions (in both English and Chinese) from the Harry Potter series and is annotated with vital background information, including dialogue scenes, speakers, character relationships, and attributes. These extensive annotations may empower LLMs to unlock character-driven dialogue capabilities. Furthermore, it can serve as a universal benchmark for evaluating how well can a LLM aligning with a specific character. We benchmark LLMs on HPD using both fine-tuning and in-context learning settings. Evaluation results reveal that although there is substantial room for improvement in generating high-quality, character-aligned responses, the proposed dataset is valuable in guiding models toward responses that better align with the character of Harry Potter.
Multi-granularity Temporal Question Answering over Knowledge Graphs
Ziyang Chen | Jinzhi Liao | Xiang Zhao
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ziyang Chen | Jinzhi Liao | Xiang Zhao
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recently, question answering over temporal knowledge graphs (i.e., TKGQA) has been introduced and investigated, in quest of reasoning about dynamic factual knowledge. To foster research on TKGQA, a few datasets have been curated (e.g., CronQuestions and Complex-CronQuestions), and various models have been proposed based on these datasets. Nevertheless, existing efforts overlook the fact that real-life applications of TKGQA also tend to be complex in temporal granularity, i.e., the questions may concern mixed temporal granularities (e.g., both day and month). To overcome the limitation, in this paper, we motivate the notion of multi-granularity temporal question answering over knowledge graphs and present a large scale dataset for multi-granularity TKGQA, namely MultiTQ. To the best of our knowledge, MultiTQis among the first of its kind, and compared with existing datasets on TKGQA, MultiTQfeatures at least two desirable aspects—ample relevant facts and multiple temporal granularities. It is expected to better reflect real-world challenges, and serve as a test bed for TKGQA models. In addition, we propose a competing baseline MultiQA over MultiTQ, which is experimentally demonstrated to be effective in dealing with TKGQA. The data and code are released at https://github.com/czy1999/MultiTQ.
Search
Fix author
Co-authors
- Xiaolong Li 3
- Zhanpeng Chen 2
- Nan Du 2
- Xiang Zhao 2
- Liefeng Bo 1
- Deng Cai 1
- Nuo Chen 1
- Xingyu Chen 1
- Baotian Hu 1
- Zhenxuan Huang 1
- Hui Huang 1
- Jen-tse Huang 1
- Haiyun Jiang 1
- Cixing LI 1
- Yuhan Li 1
- Jia Li 1
- Dongfang Li 1
- Mingxiao Li 1
- Jinzhi Liao 1
- Ruotian Ma 1
- Xinbei Ma 1
- Zhaochun Ren 1
- Zhengliang Shi 1
- Zhaopeng Tu 1
- Yan Wang 1
- Longyue Wang 1
- Yile Wang 1
- Weiqin Wang 1
- Mengru Wang 1
- Yue Wang 1
- Shanyi Wang 1
- Wenxuan Wang 1
- Qu Yang 1
- Fanghua Ye 1
- Lu Yin 1
- Zijing Zhang 1
- Min Zhang 1
- He Zhu 1
- Yuexian Zou 1