Lei Jiang

2025

Current Emotion Recognition in Conversation (ERC) research follows a closed-domain assumption. However, there is no clear consensus on emotion classification in psychology, which presents a challenge for models when it comes to recognizing previously unseen emotions in real-world applications. To bridge this gap, we introduce the Unseen Emotion Recognition in Conversation (UERC) task for the first time and propose **ProEmoTrans**, a solid prototype-based emotion transfer framework. This prototype-based approach shows promise but still faces key challenges: First, implicit expressions complicate emotion definition, which we address by proposing an LLM-enhanced description approach. Second, utterance encoding in long conversations is difficult, which we tackle with a proposed parameter-free mechanism for efficient encoding and overfitting prevention. Finally, the Markovian flow nature of emotions is hard to transfer, which we address with an improved Attention Viterbi Decoding (AVD) method to transfer seen emotion transitions to unseen emotions. Extensive experiments on three datasets show that our method serves as a strong baseline for preliminary exploration in this new area.

pdf bib abs
SenDetEX: Sentence-Level AI-Generated Text Detection for Human-AI Hybrid Content via Style and Context Fusion
Lei Jiang | Desheng Wu | Xiaolong Zheng
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Text generated by Large Language Models (LLMs) now rivals human writing, raising concerns about its misuse. However, mainstream AI-generated text detection (AGTD) methods primarily target document-level long texts and struggle to generalize effectively to sentence-level short texts. And current sentence-level AGTD (S-AGTD) research faces two significant limitations: (1) lack of a comprehensive evaluation on complex human-AI hybrid content, where human-written text (HWT) and AI-generated text (AGT) alternate irregularly, and (2) failure to incorporate contextual information, which serves as a crucial supplementary feature for identifying the origin of the detected sentence. Therefore, in our work, we propose AutoFill-Refine, a high-quality synthesis strategy for human-AI hybrid texts, and then construct a dedicated S-AGTD benchmark dataset. Besides, we introduce SenDetEX, a novel framework for sentence-level AI-generated text detection via style and context fusion. Extensive experiments demonstrate that SenDetEX significantly outperforms all baseline models in detection accuracy, while exhibiting remarkable transferability and robustness. Source code is available at https://github.com/TristoneJiang/SenDetEX.

Large Vision-Language Models (LVLMs) achieve remarkable performance in multimodal tasks but suffer from high computational costs due to the large number of visual tokens. Existing pruning methods either apply after visual tokens enter the LLM or perform pre-pruning based solely on visual attention. Both fail to balance efficiency and semantic alignment, as post-pruning incurs redundant computation, while visual-only pre-pruning overlooks multimodal relevance.To address this limitation, we propose Dual-Cue Pruning (DCP), a novel cross-modal pruning framework that jointly considers textual semantics and visual self-attention. DCP consists of a text-aware computation module, which employs a gradient-weighted attention mechanism to enhance text-visual alignment, and an image-aware computation module, which utilizes deep-layer self-attention distributions to retain essential structural information. By integrating both cues, DCP adaptively selects the most informative visual tokens, achieving efficient inference acceleration while maintaining strong task performance. Experimental results show that DCP can retain only 25% of the visual tokens, with a minimal performance degradation of only 0.063% on LLaVA-1.5-13B, demonstrating its effectiveness in balancing efficiency and accuracy.

pdf bib abs
Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs
Jiancheng Dong | Lei Jiang | Wei Jin | Lu Cheng
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Packing for Supervised Fine-Tuning (SFT) in autoregressive models involves concatenating data points of varying lengths until reaching the designed maximum length to facilitate GPU processing. However, randomly concatenating data points can lead to cross-contamination of sequences due to the significant difference in their subject matter. The mainstream approaches in SFT ensure that each token in the attention calculation phase only focuses on tokens within its own short sequence, without providing additional learning signals for the preceding context. To address these challenges, we introduce Threshold Filtering Packing (TFP), a method that selects samples with related context while maintaining sufficient diversity within the same pack. Our experiments show that TFP offers a simple-to-implement and scalable approach that significantly enhances SFT performance, with observed improvements of up to 7% on GSM8K, 4% on HumanEval. Furthermore, results from bias benchmark datasets highlight TFP’s promising performance in improving fairness while also boosting prediction accuracy by 15%.

pdf bib abs
S²-MAD: Breaking the Token Barrier to Enhance Multi-Agent Debate Efficiency
Yuting Zeng | Weizhe Huang | Lei Jiang | Tongxuan Liu | XiTai Jin | Chen Tianying Tiana | Jing Li | Xiaohua Xu
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Large language models (LLMs) have demonstrated remarkable capabilities across various natural language processing (NLP) scenarios, but they still face challenges when handling complex arithmetic and logical reasoning tasks. While Chain-Of-Thought (CoT) reasoning, self-consistency (SC) and self-correction strategies have attempted to guide models in sequential, multi-step reasoning, Multi-agent Debate (MAD) has emerged as a viable approach for enhancing the reasoning capabilities of LLMs. By increasing both the number of agents and the frequency of debates, the performance of LLMs improves significantly. However, this strategy results in a significant increase in token costs, presenting a barrier to scalability. To address this challenge, we introduce a novel sparsification strategy designed to reduce token costs within MAD. This approach minimizes ineffective exchanges of information and unproductive discussions among agents, thereby enhancing the overall efficiency of the debate process. We conduct comparative experiments on multiple datasets across various models, demonstrating that our approach significantly reduces the token costs in MAD to a considerable extent. Specifically, compared to MAD, our approach achieves an impressive reduction of up to 94.5% in token costs while maintaining performance degradation below 2.0%.

2024

pdf bib abs
TrojFSP: Trojan Insertion in Few-shot Prompt Tuning
Mengxin Zheng | Jiaqi Xue | Xun Chen | Yanshan Wang | Qian Lou | Lei Jiang
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Prompt tuning is one of the most effective solutions to adapting a fixed pre-trained language model (PLM) for various downstream tasks, especially with only a few input samples. However, the security issues, e.g., Trojan attacks, of prompt tuning on a few data samples are not well-studied. Transferring established data poisoning attacks directly to few-shot prompt tuning presents multiple challenges. One significant issue is the _poisoned imbalance issue_, where non-target class samples are added to the target class, resulting in a greater number of target-class samples compared to non-target class. While this issue is not critical in regular tuning, it significantly hampers the few-shot prompt tuning, making it difficult to simultaneously achieve a high attack success rate (ASR) and maintain clean data accuracy (CDA). Additionally, few-shot prompting is prone to overfitting in terms of both ASR and CDA. In this paper, we introduce _TrojFSP_, a method designed to address the challenges. To solve the poisoned imbalance issue, we develop a _Target-Class Shrink (TC-Shrink)_ technique, which aims to equalize the number of poisoning samples. To combat overfitting, we employ a _Selective Token Poisoning_ technique to boost attack performance. Furthermore, we introduce a _Trojan-Trigger Attention_ objective function to amplify the attention of the poisoned trojan prompt on triggers. Experiments show that our TrojFSP achieves an ASR of over 99% while maintaining negligible decreases in CDA across various PLMs and datasets. The source code of TrojFSP is available at _https://github.com/UCF-ML-Research/TrojFSP_.

2022

pdf bib abs
Dynamic Nonlinear Mixup with Distance-based Sample Selection
Shaokang Zhang | Lei Jiang | Jianlong Tan
Proceedings of the 29th International Conference on Computational Linguistics

Data augmentation with mixup has shown to be effective on the NLP tasks. Although its great success, the mixup still has shortcomings. First, vanilla mixup randomly selects one sample to generate the mixup sample for a given sample. It remains unclear how to best choose the input samples for the mixup. Second, linear interpolation limits the space of synthetic data and its regularization effect. In this paper, we propose the dynamic nonlinear mixup with distance-based sample selection, which not only generates multiple sample pairs based on the distance between each sample but also enlarges the space of synthetic samples. Specifically, we compute the distance between each input data by cosine similarity and select multiple samples for a given sample. Then we use the dynamic nonlinear mixup to fuse sample pairs. It does not use a linear, scalar mixing strategy, but a nonlinear interpolation strategy, where the mixing strategy is adaptively updated for the input and label pairs. Experiments on the multiple public datasets demonstrate that dynamic nonlinear mixup outperforms state-of-the-art methods.

pdf bib abs
Aspect Is Not You Need: No-aspect Differential Sentiment Framework for Aspect-based Sentiment Analysis
Jiahao Cao | Rui Liu | Huailiang Peng | Lei Jiang | Xu Bai
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment classification task. Most recent efforts adopt pre-trained model to classify the sentences with aspects. However, the aspect sentiment bias from pre-trained model brings some noise to the ABSA task. Besides, traditional methods using cross-entropy loss are hard to find the potential associations between sentiment polarities. In this work, we analyze the ABSA task from a novel cognition perspective: humans can often judge the sentiment of an aspect even if they do not know what the aspect is. Moreover, it is easier to distinguish positive and negative sentiments than others for human beings because positive and negative are two opposite sentiments. To this end, we propose a no-aspect differential sentiment (NADS) framework for the ABSA task. We first design a no-aspect template by replacing the aspect with a special unbiased character to eliminate the sentiment bias and obtain a stronger representation. To better get the benefits from the template, we adopt contrastive learning between the no-aspect template and the original sentence. Then we propose a differential sentiment loss instead of the cross-entropy loss to better classify the sentiments by distinguishing the different distances between sentiments. Our proposed model is a general framework and can be combined with almost all traditional ABSA methods. Experiments on SemEval 2014 show that our framework is still able to predict the sentiment of the aspect even we don’t konw what the aspect is. Moreover, our NADS framework boosts three typical ABSA methods and achieves state-of-the-art performance.

2021

pdf bib abs
CRYPTOGRU: Low Latency Privacy-Preserving Text Analysis With GRU
Bo Feng | Qian Lou | Lei Jiang | Geoffrey Fox
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Homomorphic encryption (HE) and garbled circuit (GC) provide the protection for users’ privacy. However, simply mixing the HE and GC in RNN models suffer from long inference latency due to slow activation functions. In this paper, we present a novel hybrid structure of HE and GC gated recurrent unit (GRU) network, , for low-latency secure inferences. replaces computationally expensive GC-based tanh with fast GC-based ReLU, and then quantizes sigmoid and ReLU to smaller bit-length to accelerate activations in a GRU. We evaluate with multiple GRU models trained on 4 public datasets. Experimental results show achieves top-notch accuracy and improves the secure inference latency by up to 138× over one of the state-of-the-art secure networks on the Penn Treebank dataset.

2019

pdf bib abs
Improving Natural Language Understanding by Reverse Mapping Bytepair Encoding
Chaodong Tong | Huailiang Peng | Qiong Dai | Lei Jiang | Jianghua Huang
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

We propose a method called reverse mapping bytepair encoding, which maps named-entity information and other word-level linguistic features back to subwords during the encoding procedure of bytepair encoding (BPE). We employ this method to the Generative Pre-trained Transformer (OpenAI GPT) by adding a weighted linear layer after the embedding layer. We also propose a new model architecture named as the multi-channel separate transformer to employ a training process without parameter-sharing. Evaluation on Stories Cloze, RTE, SciTail and SST-2 datasets demonstrates the effectiveness of our approach.