Negotiation agents need to influence the attitudes or intentions of users to reach a consensus. Strategy planning and expressive optimization are crucial aspects of effective negotiations. However, previous studies have typically focused on only one of these aspects, neglecting the fact that their combined synergistic effect can lead to better performance. Inspired by the dual-process theory in human cognition, we propose a Dual-Mind Negotiation Agent (DMNA) framework. This framework integrates an intuitive module for rapid, experience-based response and a deliberative module for slow, expression optimization. The intuitive module is trained using Monte Carlo Tree Search (MCTS) and Direct Preference Optimization (DPO), enabling it to make suitable strategic planning and expression. The deliberative module employs a multifaceted reflexion mechanism to enhance the quality of expression. Experiments conducted on negotiation datasets confirm that DMNA achieves state-of-the-art results, demonstrating an enhancement in the negotiation ability of agents.
Recently, there has been significant interest in replacing the reward model in Reinforcement Learning with Human Feedback (RLHF) methods for Large Language Models (LLMs), such as Direct Preference Optimization (DPO) and its variants. These approaches commonly use a binary cross-entropy mechanism on pairwise samples, i.e., minimizing and maximizing the loss based on preferred or dis-preferred responses, respectively. However, while this training strategy omits the reward model, it also overlooks the varying preference degrees within different responses. We hypothesize that this is a key factor hindering LLMs from sufficiently understanding human preferences. To address this problem, we propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss, thereby helping LLMs improve their ability to understand the degree of preference. Extensive experiments are conducted on two widely used datasets of different tasks. The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods and significantly boost their performance to achieve state-of-the-art performance. We also conduct detailed analyses to offer comprehensive insights into SPO, which verifies its effectiveness. The code is available at https://github.com/lijian16/SPO.
Interactive argument pair identification is an emerging research task for argument mining, aiming to identify whether two arguments are interactively related. It is pointed out that the context of the argument is essential to improve identification performance. However, current context-based methods achieve limited improvements since the entire context typically contains much irrelevant information. In this paper, we propose a simple contrastive learning framework to solve this problem by extracting valuable information from the context. This framework can construct hard argument-context samples and obtain a robust and uniform representation by introducing contrastive learning. We also propose an argument-context extraction module to enhance information extraction by discarding irrelevant blocks. The experimental results show that our method achieves the state-of-the-art performance on the benchmark dataset. Further analysis demonstrates the effectiveness of our proposed modules and visually displays more compact semantic representations.