2024
pdf
abs
Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding
Kuo Liao
|
Shuang Li
|
Meng Zhao
|
Liqun Liu
|
Mengge Xue
|
Zhenyu Hu
|
Honglin Han
|
Chengguo Yin
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent strides in large language models (LLMs) have yielded remarkable performance, leveraging reinforcement learning from human feedback (RLHF) to significantly enhance generation and alignment capabilities. However, RLHF encounters numerous challenges, including the objective mismatch issue, leading to suboptimal performance in Natural Language Understanding (NLU) tasks.To address this limitation, we propose a novel Reinforcement Learning framework enhanced with Label-sensitive Reward (RLLR) to amplify the performance of LLMs in NLU tasks. By incorporating label-sensitive pairs into reinforcement learning, our method aims to adeptly capture nuanced label-sensitive semantic features during RL, thereby enhancing natural language understanding.Experiments conducted on five diverse foundation models across eight tasks showcase promising results. In comparison to Supervised Fine-tuning models (SFT), RLLR demonstrates an average performance improvement of 1.54%. Compared with RLHF models, the improvement averages at 0.69%. These results reveal the effectiveness of our method for LLMs in NLU tasks.
pdf
abs
Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors
Mengge Xue
|
Zhenyu Hu
|
Liqun Liu
|
Kuo Liao
|
Shuang Li
|
Honglin Han
|
Meng Zhao
|
Chengguo Yin
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multiple-Choice Questions (MCQs) constitute a critical area of research in the study of Large Language Models (LLMs). Previous works have investigated the selection bias problem in MCQs within few-shot scenarios, in which the LLM’s performance may be influenced by the presentation of answer choices, leaving the selection bias during Supervised Fine-Tuning (SFT) unexplored. In this paper, we reveal that selection bias persists in the SFT phase , primarily due to the LLM’s inadequate Multiple Choice Symbol Binding (MCSB) ability. This limitation implies that the model struggles to associate the answer options with their corresponding symbols (e.g., A/B/C/D) effectively. To enhance the model’s MCSB capability, we first incorporate option contents into the loss function and subsequently adjust the weights of the option symbols and contents, guiding the model to understand the option content of the current symbol. Based on this, we introduce an efficient SFT algorithm for MCQs, termed Point-wise Intelligent Feedback (PIF). PIF constructs negative instances by randomly combin- ing the incorrect option contents with all candidate symbols, and proposes a point-wise loss to provide feedback on these negative samples into LLMs. Our experimental results demonstrate that PIF significantly reduces the model’s selection bias by improving its MCSB capability. Remarkably, PIF exhibits a substantial enhancement in the accuracy for MCQs.
2023
pdf
abs
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities
Zhe Zhao
|
Yudong Li
|
Cheng Hou
|
Jing Zhao
|
Rong Tian
|
Weijie Liu
|
Yiren Chen
|
Ningyuan Sun
|
Haoyan Liu
|
Weiquan Mao
|
Han Guo
|
Weigang Gou
|
Taiqiang Wu
|
Tao Zhu
|
Wenhang Shi
|
Chen Chen
|
Shan Huang
|
Sihong Chen
|
Liqun Liu
|
Feifei Li
|
Xiaoshuai Chen
|
Xingwu Sun
|
Zhanhui Kang
|
Xiaoyong Du
|
Linlin Shen
|
Kimmo Yan
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Recently, the success of pre-training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pre-training models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-training model. The modular design enables users to efficiently reproduce existing pre-training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations.
2019
pdf
abs
NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit
Liqun Liu
|
Funan Mu
|
Pengyu Li
|
Xin Mu
|
Jing Tang
|
Xingsheng Ai
|
Ran Fu
|
Lifeng Wang
|
Xing Zhou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations
In this paper, we introduce NeuralClassifier, a toolkit for neural hierarchical multi-label text classification. NeuralClassifier is designed for quick implementation of neural models for hierarchical multi-label classification task, which is more challenging and common in real-world scenarios. A salient feature is that NeuralClassifier currently provides a variety of text encoders, such as FastText, TextCNN, TextRNN, RCNN, VDCNN, DPCNN, DRNN, AttentiveConvNet and Transformer encoder, etc. It also supports other text classification scenarios, including binary-class and multi-class classification. Built on PyTorch, the core operations are calculated in batch, making the toolkit efficient with the acceleration of GPU. Experiments show that models built in our toolkit achieve comparable performance with reported results in the literature.