2024
pdf
abs
Aligning Large Language Models with Human Preferences through Representation Engineering
Wenhao Liu
|
Xiaohua Wang
|
Muling Wu
|
Tianlong Li
|
Changze Lv
|
Zixuan Ling
|
Zhu JianHao
|
Cenyuan Zhang
|
Xiaoqing Zheng
|
Xuanjing Huang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Aligning large language models (LLMs) with human preferences is crucial for enhancing their utility in terms of helpfulness, truthfulness, safety, harmlessness, and interestingness. Existing methods for achieving this alignment often involve employing reinforcement learning from human feedback (RLHF) to fine-tune LLMs based on human labels assessing the relative quality of model responses. Nevertheless, RLHF is susceptible to instability during fine-tuning and presents challenges in implementation. Drawing inspiration from the emerging field of representation engineering (RepE), this study aims to identify relevant representations for high-level human preferences embedded in patterns of activity within an LLM and achieve precise control of model behavior by transforming its representations. This novel approach, denoted as Representation Alignment from Human Feedback (RAHF), proves to be effective, computationally efficient, and easy to implement. Extensive experiments demonstrate the efficacy of RAHF in not only capturing but also manipulating representations to align with a broad spectrum of human preferences or values, rather than being confined to a singular concept or function (e.g. honesty or bias). RAHF’s versatility in accommodating diverse human preferences shows its potential for advancing LLM performance.
pdf
abs
Advancing Parameter Efficiency in Fine-tuning via Representation Editing
Muling Wu
|
Wenhao Liu
|
Xiaohua Wang
|
Tianlong Li
|
Changze Lv
|
Zixuan Ling
|
Zhu JianHao
|
Cenyuan Zhang
|
Xiaoqing Zheng
|
Xuanjing Huang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Parameter Efficient Fine-Tuning (PEFT) has gained significant attention for its ability to achieve competitive results while updating only a small subset of trainable parameters. Despite the promising performance of current PEFT methods, they present challenges in hyperparameter selection, such as determining the rank of LoRA or Adapter, or specifying the length of soft prompts. In addressing these challenges, we propose a novel approach to fine-tuning neural models, termed Representation EDiting (RED), which scales and biases the representation produced at each layer. RED substantially reduces the number of trainable parameters by a factor of 25,700 compared to full parameter fine-tuning, and by a factor of 32 compared to LoRA. Remarkably, RED achieves comparable or superior results to full parameter fine-tuning and other PEFT methods. Extensive experiments were conducted across models of varying architectures and scales, including RoBERTa, GPT-2, T5, and Llama-2, and the results demonstrate the efficiency and efficacy of RED, positioning it as a promising PEFT approach for large neural models.
2023
pdf
abs
Enhancing Unsupervised Semantic Parsing with Distributed Contextual Representations
Zixuan Ling
|
Xiaoqing Zheng
|
Jianhan Xu
|
Jinshu Lin
|
Kai-Wei Chang
|
Cho-Jui Hsieh
|
Xuanjing Huang
Findings of the Association for Computational Linguistics: ACL 2023
We extend a non-parametric Bayesian model of (Titov and Klementiev, 2011) to deal with homonymy and polysemy by leveraging distributed contextual word and phrase representations pre-trained on a large collection of unlabelled texts. Then, unsupervised semantic parsing is performed by decomposing sentences into fragments, clustering the fragments to abstract away syntactic variations of the same meaning, and predicting predicate-argument relations between the fragments. To better model the statistical dependencies between predicates and their arguments, we further conduct a hierarchical Pitman-Yor process. An improved Metropolis-Hastings merge-split sampler is proposed to speed up the mixing and convergence of Markov chains by leveraging pre-trained distributed representations. The experimental results show that the models achieve better accuracy on both question-answering and relation extraction tasks.
pdf
abs
Parameter Efficient Multi-task Fine-tuning by Learning to Transfer Token-wise Prompts
Muling Wu
|
Wenhao Liu
|
Jianhan Xu
|
Changze Lv
|
Zixuan Ling
|
Tianlong Li
|
Longtao Huang
|
Xiaoqing Zheng
|
Xuanjing Huang
Findings of the Association for Computational Linguistics: EMNLP 2023
Prompt tuning has been proven to be successful on various tasks by incorporating a small number of trainable parameters while freezing large pre-trained language models (PLMs). However, it is still unsettled how to generate more proper prompts for any individual examples and how to extend prompt tuning to multi-task learning scenarios by leveraging cross-task features. To address these challenges, we propose a token-wise prompt tuning (TPT), in which a bank of finer-grained soft prompt tokens is built for multi-task learning by memory network. The tokens are retrieved from the bank against an input example and assembled to an instance-dependent prompt. Extensive experimental results on 14 datasets demonstrated that the models enhanced by our TPT performed far better than full parameter fine-tuned models and achieved state-of-the-art by tuning only 0.035% parameters.