2025
pdf
bib
abs
SDPO: Segment-Level Direct Preference Optimization for Social Agents
Aobo Kong
|
Wentao Ma
|
Shiwan Zhao
|
Yongbin Li
|
Yuchuan Wu
|
Ke Wang
|
Xiaoqian Liu
|
Qicheng Li
|
Yong Qin
|
Fei Huang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Social agents powered by large language models (LLMs) can simulate human social behaviors but fall short in handling complex social dialogues. Direct Preference Optimization (DPO) has proven effective in aligning LLM behavior with human preferences across various agent tasks. However, standard DPO focuses solely on individual turns, which limits its effectiveness in multi-turn social interactions. Several DPO-based multi-turn alignment methods with session-level data have shown potential in addressing this problem. While these methods consider multiple turns across entire sessions, they are often overly coarse-grained, introducing training noise, and lack robust theoretical support. To resolve these limitations, we propose Segment-Level Direct Preference Optimization (SDPO), which dynamically select key segments within interactions to optimize multi-turn agent behavior. SDPO minimizes training noise and is grounded in a rigorous theoretical framework. Evaluations on the SOTOPIA benchmark demonstrate that SDPO-tuned agents consistently outperform both existing DPO-based methods and proprietary LLMs like GPT-4o, underscoring SDPO’s potential to advance the social intelligence of LLM-based agents. We release our code and data at https://anonymous.4open.science/r/SDPO-CE8F.
pdf
bib
abs
EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning
Xiaoqian Liu
|
Ke Wang
|
Yongbin Li
|
Yuchuan Wu
|
Wentao Ma
|
Aobo Kong
|
Fei Huang
|
Jianbin Jiao
|
Junge Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have shown impressive reasoning capabilities in well-defined problems with clear solutions, such as mathematics and coding. However, they still struggle with complex real-world scenarios like business negotiations, which require strategic reasoning—an ability to navigate dynamic environments and align long-term goals amidst uncertainty.Existing methods for strategic reasoning face challenges in adaptability, scalability, and transferring strategies to new contexts.To address these issues, we propose explicit policy optimization (*EPO*) for strategic reasoning, featuring an LLM that provides strategies in open-ended action space and can be plugged into arbitrary LLM agents to motivate goal-directed behavior.To improve adaptability and policy transferability, we train the strategic reasoning model via multi-turn reinforcement learning (RL), utilizing process rewards and iterative self-play.Experiments across social and physical domains demonstrate *EPO*’s ability of long-term goal alignment through enhanced strategic reasoning, achieving state-of-the-art performance on social dialogue and web navigation tasks. Our findings reveal various collaborative reasoning mechanisms emergent in *EPO* and its effectiveness in generating novel strategies, underscoring its potential for strategic reasoning in real-world applications. Code and data are available at [https://github.com/lxqpku/EPO](https://github.com/lxqpku/EPO).
pdf
bib
abs
Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation
Yingfeng Luo
|
Tong Zheng
|
Yongyu Mu
|
Bei Li
|
Qinghong Zhang
|
Yongqi Gao
|
Ziqiang Xu
|
Peinan Feng
|
Xiaoqian Liu
|
Tong Xiao
|
JingBo Zhu
Findings of the Association for Computational Linguistics: ACL 2025
The field of neural machine translation (NMT) has changed with the advent of large language models (LLMs). Much of the recent emphasis in natural language processing (NLP) has been on modeling machine translation and many other problems using a single pre-trained Transformer decoder, while encoder-decoder architectures, which were the standard in earlier NMT models, have received relatively less attention. In this paper, we explore translation models that are universal, efficient, and easy to optimize, by marrying the world of LLMs with the world of NMT. We apply LLMs to NMT encoding and leave the NMT decoder unchanged. We also develop methods for adapting LLMs to work better with the NMT decoder. Furthermore, we construct a new dataset involving multiple tasks to assess how well the machine translation system generalizes across various tasks. Evaluations on the WMT and our datasets show that results using our method match or surpass a range of baselines in terms of translation quality, but achieve 2.4 ∼ 6.5 × inference speedups and a 75% reduction in the memory footprint of the KV cache. It also demonstrates strong generalization across a variety of translation-related tasks.
2024
pdf
bib
abs
Revisiting Interpolation Augmentation for Speech-to-Text Generation
Chen Xu
|
Jie Wang
|
Xiaoqian Liu
|
Qian Dong
|
Chunliang Zhang
|
Tong Xiao
|
JingBo Zhu
|
Dapeng Man
|
Wu Yang
Findings of the Association for Computational Linguistics: ACL 2024
Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique’s application in S2T tasks has remained under-explored. In this paper, we delve into the utility of interpolation augmentation, guided by several pivotal questions. Our findings reveal that employing an appropriate strategy in interpolation augmentation significantly enhances performance across diverse tasks, architectures, and data scales, offering a promising avenue for more robust S2T systems in resource-constrained settings.
2023
pdf
bib
abs
CTC-based Non-autoregressive Speech Translation
Chen Xu
|
Xiaoqian Liu
|
Xiaowen Liu
|
Qingxuan Sun
|
Yuhao Zhang
|
Murun Yang
|
Qianqian Dong
|
Tom Ko
|
Mingxuan Wang
|
Tong Xiao
|
Anxiang Ma
|
Jingbo Zhu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Combining end-to-end speech translation (ST) and non-autoregressive (NAR) generation is promising in language and speech processing for their advantages of less error propagation and low latency. In this paper, we investigate the potential of connectionist temporal classification (CTC) for non-autoregressive speech translation (NAST).In particular, we develop a model consisting of two encoders that are guided by CTC to predict the source and target texts, respectively. Introducing CTC into NAST on both language sides has obvious challenges: 1) the conditional independent generation somewhat breaks the interdependency among tokens, and 2) the monotonic alignment assumption in standard CTC does not hold in translation tasks. In response, we develop a prediction-aware encoding approach and a cross-layer attention approach to address these issues. We also use curriculum learning to improve convergence of training. Experiments on the MuST-C ST benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67×, which is comparable to the autoregressive counterpart and even outperforms the previous best result of 0.9 BLEU points.
pdf
bib
abs
Bridging the Granularity Gap for Acoustic Modeling
Chen Xu
|
Yuhao Zhang
|
Chengbo Jiao
|
Xiaoqian Liu
|
Chi Hu
|
Xin Zeng
|
Tong Xiao
|
Anxiang Ma
|
Huizhen Wang
|
Jingbo Zhu
Findings of the Association for Computational Linguistics: ACL 2023
While Transformer has become the de-facto standard for speech, modeling upon the fine-grained frame-level features remains an open challenge of capturing long-distance dependencies and distributing the attention weights. We propose Progressive Down-Sampling (PDS) which gradually compresses the acoustic features into coarser-grained units containing more complete semantic information, like text-level representation. In addition, we develop a representation fusion method to alleviate information loss that occurs inevitably during high compression. In this way, we compress the acoustic features into 1/32 of the initial length while achieving better or comparable performances on the speech recognition task. And as a bonus, it yields inference speedups ranging from 1.20x to 1.47x.By reducing the modeling burden, we also achieve competitive results when training on the more challenging speech translation task.
pdf
bib
abs
The NiuTrans End-to-End Speech Translation System for IWSLT23 English-to-Chinese Offline Task
Yuchen Han
|
Xiaoqian Liu
|
Hao Chen
|
Yuhao Zhang
|
Chen Xu
|
Tong Xiao
|
Jingbo Zhu
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
This paper describes the NiuTrans end-to-end speech translation system submitted for the IWSLT 2023 English-to-Chinese offline task. Our speech translation models are composed of pre-trained ASR and MT models under the SATE framework. Several pre-trained models with diverse architectures and input representations (e.g., log Mel-filterbank and waveform) were utilized. We proposed an IDA method to iteratively improve the performance of the MT models and generate the pseudo ST data through MT systems. We then trained ST models with different structures and data settings to enhance ensemble performance. Experimental results demonstrate that our NiuTrans system achieved a BLEU score of 29.22 on the MuST-C En-Zh tst-COMMON set, outperforming the previous year’s submission by 0.12 BLEU despite using less MT training data.
2022
pdf
bib
abs
The NiuTrans’s Submission to the IWSLT22 English-to-Chinese Offline Speech Translation Task
Yuhao Zhang
|
Canan Huang
|
Chen Xu
|
Xiaoqian Liu
|
Bei Li
|
Anxiang Ma
|
Tong Xiao
|
Jingbo Zhu
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)
This paper describes NiuTrans’s submission to the IWSLT22 English-to-Chinese (En-Zh) offline speech translation task. The end-to-end and bilingual system is built by constrained English and Chinese data and translates the English speech to Chinese text without intermediate transcription. Our speech translation models are composed of different pre-trained acoustic models and machine translation models by two kinds of adapters. We compared the effect of the standard speech feature (e.g. log Mel-filterbank) and the pre-training speech feature and try to make them interact. The final submission is an ensemble of three potential speech translation models. Our single best and ensemble model achieves 18.66 BLEU and 19.35 BLEU separately on MuST-C En-Zh tst-COMMON set.
2021
pdf
bib
abs
The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task
Chen Xu
|
Xiaoqian Liu
|
Xiaowen Liu
|
Tiger Wang
|
Canan Huang
|
Tong Xiao
|
Jingbo Zhu
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
This paper describes the submission of the NiuTrans end-to-end speech translation system for the IWSLT 2021 offline task, which translates from the English audio to German text directly without intermediate transcription. We use the Transformer-based model architecture and enhance it by Conformer, relative position encoding, and stacked acoustic and textual encoding. To augment the training data, the English transcriptions are translated to German translations. Finally, we employ ensemble decoding to integrate the predictions from several models trained with the different datasets. Combining these techniques, we achieve 33.84 BLEU points on the MuST-C En-De test set, which shows the enormous potential of the end-to-end model.
2020
pdf
bib
abs
Pretrain-KGE: Learning Knowledge Representation from Pretrained Language Models
Zhiyuan Zhang
|
Xiaoqian Liu
|
Yi Zhang
|
Qi Su
|
Xu Sun
|
Bin He
Findings of the Association for Computational Linguistics: EMNLP 2020
Conventional knowledge graph embedding (KGE) often suffers from limited knowledge representation, leading to performance degradation especially on the low-resource problem. To remedy this, we propose to enrich knowledge representation via pretrained language models by leveraging world knowledge from pretrained models. Specifically, we present a universal training framework named Pretrain-KGE consisting of three phases: semantic-based fine-tuning phase, knowledge extracting phase and KGE training phase. Extensive experiments show that our proposed Pretrain-KGE can improve results over KGE models, especially on solving the low-resource problem.
pdf
bib
abs
The NiuTrans Machine Translation Systems for WMT20
Yuhao Zhang
|
Ziyang Wang
|
Runzhe Cao
|
Binghao Wei
|
Weiqiao Shan
|
Shuhan Zhou
|
Abudurexiti Reheman
|
Tao Zhou
|
Xin Zeng
|
Laohu Wang
|
Yongyu Mu
|
Jingnan Zhang
|
Xiaoqian Liu
|
Xuanjun Zhou
|
Yinqiao Li
|
Bei Li
|
Tong Xiao
|
Jingbo Zhu
Proceedings of the Fifth Conference on Machine Translation
This paper describes NiuTrans neural machine translation systems of the WMT20 news translation tasks. We participated in Japanese<->English, English->Chinese, Inuktitut->English and Tamil->English total five tasks and rank first in Japanese<->English both sides. We mainly utilized iterative back-translation, different depth and widen model architectures, iterative knowledge distillation and iterative fine-tuning. And we find that adequately widened and deepened the model simultaneously, the performance will significantly improve. Also, iterative fine-tuning strategy we implemented is effective during adapting domain. For Inuktitut->English and Tamil->English tasks, we built multilingual models separately and employed pretraining word embedding to obtain better performance.