Junli Wang


2025

pdf bib
Disentangle to Decay: Linear Attention with Trainable Decay Factor
Haibo Tong | Chenyang Zhang | Jiayi Lin | Bingxuan Hou | Qingqing Hong | Junli Wang
Proceedings of the 31st International Conference on Computational Linguistics

Linear attention enhances inference efficiency of Transformer and has attracted research interests as an efficient backbone of language models. Existing linear attention based models usually exploit decay factor based positional encoding (PE), where attention scores decay exponentially with increasing relative distance. However, most work manually designs a non-trainable decay factor of exponential calculation, which limits further optimization. Our analysis reveals directly training decay factor is unstable because of large gradients. To address this, we propose a novel PE for linear attention named Disentangle to Decay (D2D). D2D disentangles decay factor into two parts to achieve further optimization and stable training. Moreover, D2D can be transformed into recurrent form for efficient inference. Experiments demonstrate that D2D achieves stable training of decay factor, and enhances performance of linear attention in both normal context length and length extrapolation scenarios.

pdf bib
A Lightweight Multi Aspect Controlled Text Generation Solution For Large Language Models
Chenyang Zhang | Jiayi Lin | Haibo Tong | Bingxuan Hou | Dongyu Zhang | Jialin Li | Junli Wang
Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)

Multi-Aspect Controllable Text Generation (MCTG) introduces fine-grained multiple constraints in natural language generation, i.e. control attributes in topics, sentiments, and detoxification.MCTG demonstrates application prospects for trustworthy generation of Large Language Models (LLMs) but is limited by generalization issues.Existing work exploits additional structures and strategies for solutions, requiring LLMs’ modifications.To activate LLMs’ MCTG ability, we propose a lightweight MCTG pipeline based on data augmentation and instruction tuning.We analyze aspect bias and correlations in traditional datasets and address these concerns with augmented control attributes and sentences.Augmented datasets are feasible for instruction tuning.We conduct experiments for various LLMs backbone and parameter sizes, demonstrating general effectiveness on MCTG performance.

2024

pdf bib
Correct after Answer: Enhancing Multi-Span Question Answering with Post-Processing Method
Jiayi Lin | Chenyang Zhang | Haibo Tong | Dongyu Zhang | Qingqing Hong | Bingxuan Hou | Junli Wang
Findings of the Association for Computational Linguistics: EMNLP 2024

2022

pdf bib
Non-Autoregressive Neural Machine Translation with Consistency Regularization Optimized Variational Framework
Minghao Zhu | Junli Wang | Chungang Yan
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Variational Autoencoder (VAE) is an effective framework to model the interdependency for non-autoregressive neural machine translation (NAT). One of the prominent VAE-based NAT frameworks, LaNMT, achieves great improvements to vanilla models, but still suffers from two main issues which lower down the translation quality: (1) mismatch between training and inference circumstances and (2) inadequacy of latent representations. In this work, we target on addressing these issues by proposing posterior consistency regularization. Specifically, we first perform stochastic data augmentation on the input samples to better adapt the model for inference circumstance, and then conduct consistency training on posterior latent variables to construct a more robust latent representations without any expansion on latent size. Experiments on En<->De and En<->Ro benchmarks confirm the effectiveness of our methods with about 1.5/0.7 and 0.8/0.3 BLEU points improvement to the baseline model with about 12.6× faster than autoregressive Transformer.