2025
pdf
bib
abs
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Youliang Yuan
|
Wenxiang Jiao
|
Wenxuan Wang
|
Jen-tse Huang
|
Jiahao Xu
|
Tian Liang
|
Pinjia He
|
Zhaopeng Tu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This study addresses a critical gap in safety tuning practices for Large Language Models (LLMs) by identifying and tackling a refusal position bias within safety tuning data, which compromises the models’ ability to appropriately refuse generating unsafe content. We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at any response position, significantly enhancing their safety capabilities. DeRTa incorporates two novel components: (1) Maximum Likelihood Estimation (MLE) with Harmful Response Prefix, which trains models to recognize and avoid unsafe content by appending a segment of harmful response to the beginning of a safe response, and (2) Reinforced Transition Optimization (RTO), which equips models with the ability to transition from potential harm to safety refusal consistently throughout the harmful response sequence. Our empirical evaluation, conducted using LLaMA3 and Mistral model families across six attack scenarios, demonstrates that our method not only improves model safety without compromising performance but also surpasses baseline methods in defending against attacks.
pdf
bib
abs
FanChuan: A Multilingual and Graph-Structured Benchmark For Parody Detection and Analysis
Yilun Zheng
|
Sha Li
|
Fangkun Wu
|
Yang Ziyi
|
Lin Hongchao
|
Zhichao Hu
|
Cai Xinjun
|
Ziming Wang
|
Jinxuan Chen
|
Sitao Luan
|
Jiahao Xu
|
Lihui Chen
Findings of the Association for Computational Linguistics: ACL 2025
Parody is an emerging phenomenon on social media, where individuals imitate a role or position opposite to their own, often for humor, provocation, or controversy. Detecting and analyzing parody can be challenging and is often reliant on context, yet it plays a crucial role in understanding cultural values, promoting subcultures, and enhancing self-expression. However, the study of parody is hindered by limited available data and deficient diversity in current datasets. To bridge this gap, we built seven parody datasets from both English and Chinese corpora, with 14,755 annotated users and 21,210 annotated comments in total. To provide sufficient context information, we also collect replies and construct user-interaction graphs to provide richer contextual information, which is lacking in existing datasets. With these datasets, we test traditional methods and Large Language Models (LLMs) on three key tasks: (1) parody detection, (2) comment sentiment analysis with parody, and (3) user sentiment analysis with parody. Our extensive experiments reveal that parody-related tasks still remain challenging for all models, and contextual information plays a critical role. Interestingly, we find that, in certain scenarios, traditional sentence embedding methods combined with simple classifiers can outperform advanced LLMs, i.e. DeepSeek-R1 and GPT-o3, highlighting parody as a significant challenge for LLMs.
2024
pdf
bib
abs
TransAgents: Build Your Translation Company with Language Agents
Minghao Wu
|
Jiahao Xu
|
Longyue Wang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Multi-agent systems empowered by large language models (LLMs) have demonstrated remarkable capabilities in a wide range of downstream applications. In this work, we introduce TransAgents, a novel multi-agent translation system inspired by human translation companies. TransAgents employs specialized agents—Senior Editor, Junior Editor, Translator, Localization Specialist, and Proofreader—to collaboratively produce translations that are accurate, culturally sensitive, and of high quality. Our system is flexible, allowing users to configure their translation company based on specific needs, and universal, with empirical evidence showing superior performance across various domains compared to state-of-the-art methods. Additionally, TransAgents features a user-friendly interface and offers translations at a cost approximately 80× cheaper than professional human translation services. Evaluations on literary, legal, and financial test sets demonstrate that TransAgents produces translations preferred by human evaluators, even surpassing human-written references in literary contexts. Our live demo website is available at https://www.transagents.ai/. Our demonstration video is available at https://www.youtube.com/watch?v=p7jIAtF-WKc.
pdf
bib
abs
Findings of the WMT 2024 Shared Task on Discourse-Level Literary Translation
Longyue Wang
|
Siyou Liu
|
Chenyang Lyu
|
Wenxiang Jiao
|
Xing Wang
|
Jiahao Xu
|
Zhaopeng Tu
|
Yan Gu
|
Weiyu Chen
|
Minghao Wu
|
Liting Zhou
|
Philipp Koehn
|
Andy Way
|
Yulin Yuan
Proceedings of the Ninth Conference on Machine Translation
Translating literary works has perennially stood as an elusive dream in machine translation (MT), a journey steeped in intricate challenges. To foster progress in this domain, we hold a new shared task at WMT 2023, the second edition of the
Discourse-Level Literary Translation. First, we (Tencent AI Lab and China Literature Ltd.) release a copyrighted and document-level Chinese-English web novel corpus. Furthermore, we put forth an industry-endorsed criteria to guide human evaluation process. This year, we totally received 10 submissions from 5 academia and industry teams. We employ both automatic and human evaluations to measure the performance of the submitted systems. The official ranking of the systems is based on the overall human judgments. In addition, our extensive analysis reveals a series of interesting findings on literary and discourse-aware MT. We release data, system outputs, and leaderboard at
https://www2.statmt.org/wmt24/literary-translation-task.html.
2023
pdf
bib
abs
SimCSE++: Improving Contrastive Learning for Sentence Embeddings from Two Perspectives
Jiahao Xu
|
Wei Shao
|
Lihui Chen
|
Lemao Liu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
This paper improves contrastive learning for sentence embeddings from two perspectives: handling dropout noise and addressing feature corruption. Specifically, for the first perspective, we identify that the dropout noise from negative pairs affects the model’s performance. Therefore, we propose a simple yet effective method to deal with such type of noise. Secondly, we pinpoint the rank bottleneck of current solutions to feature corruption and propose a dimension-wise contrastive learning objective to address this issue. Both proposed methods are generic and can be applied to any contrastive learning based models for sentence embeddings. Experimental results on standard benchmarks demonstrate that combining both proposed methods leads to a gain of 1.8 points compared to the strong baseline SimCSE configured with BERT base. Furthermore, applying the proposed method to DiffCSE, another strong contrastive learning based baseline, results in a gain of 1.4 points.
pdf
bib
abs
DistillCSE: Distilled Contrastive Learning for Sentence Embeddings
Jiahao Xu
|
Wei Shao
|
Lihui Chen
|
Lemao Liu
Findings of the Association for Computational Linguistics: EMNLP 2023
This paper proposes the DistillCSE framework, which performs contrastive learning under the self-training paradigm with knowledge distillation. The potential advantage of DistillCSE is its self-enhancing feature: using a base model to provide additional supervision signals, a stronger model may be learned through knowledge distillation. However, the vanilla DistillCSE through the standard implementation of knowledge distillation only achieves marginal improvements. The quantitative analyses demonstrate its reason that the standard knowledge distillation exhibits a relatively large variance of the teacher model’s logits due to the essence of contrastive learning. To mitigate the issue induced by high variance, this paper accordingly proposed two simple yet effective solutions for knowledge distillation: a Group-P shuffling strategy as an implicit regularization and the averaging logits from multiple teacher components. Experiments on standard benchmarks demonstrate that the proposed DistillCSE outperforms many strong baseline methods and yields a new state-of-the-art performance.
2022
pdf
bib
abs
On Synthetic Data for Back Translation
Jiahao Xu
|
Yubin Ruan
|
Wei Bi
|
Guoping Huang
|
Shuming Shi
|
Lihui Chen
|
Lemao Liu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Back translation (BT) is one of the most significant technologies in NMT research fields. Existing attempts on BT share a common characteristic: they employ either beam search or random sampling to generate synthetic data with a backward model but seldom work studies the role of synthetic data in the performance of BT. This motivates us to ask a fundamental question: what kind of synthetic data contributes to BT performance?Through both theoretical and empirical studies, we identify two key factors on synthetic data controlling the back-translation NMT performance, which are quality and importance. Furthermore, based on our findings, we propose a simple yet effective method to generate synthetic data to better trade off both factors so as to yield the better performance for BT. We run extensive experiments on WMT14 DE-EN, EN-DE, and RU-EN benchmark tasks. By employing our proposed method to generate synthetic data, our BT model significantly outperforms the standard BT baselines (i.e., beam and sampling based methods for data generation), which proves the effectiveness of our proposed methods.