Jeet Gor


2024

pdf
Efficiently Acquiring Human Feedback with Bayesian Deep Learning
Haishuo Fang | Jeet Gor | Edwin Simpson
Proceedings of the 1st Workshop on Uncertainty-Aware NLP (UncertaiNLP 2024)

Learning from human feedback can improve models for text generation or passage ranking, aligning them better to a user’s needs. Data is often collected by asking users to compare alternative outputs to a given input, which may require a large number of comparisons to learn a ranking function. The amount of comparisons needed can be reduced using Bayesian Optimisation (BO) to query the user about only the most promising candidate outputs. Previous applications of BO to text ranking relied on shallow surrogate models to learn ranking functions over candidate outputs,and were therefore unable to fine-tune rankers based on deep, pretrained language models. This paper leverages Bayesian deep learning (BDL) to adapt pretrained language models to highly specialised text ranking tasks, using BO to tune the model with a small number of pairwise preferences between candidate outputs. We apply our approach to community question answering (cQA) and extractive multi-document summarisation (MDS) with simulated noisy users, finding that our BDL approach significantly outperforms both a shallow Gaussian process model and traditional active learning with a standard deep neural network, while remaining robust to noise in the user feedback.