Xiaoyu Li

2025

The evolution of Large Language Models (LLMs) has significantly advanced multi-turn conversation systems, emphasizing the need for proactive guidance to enhance users’ interactions. However, these systems face challenges in dynamically adapting to shifts in users’ goals and maintaining low latency for real-time interactions. In the Baidu Search AI assistant, an industrial-scale multi-turn search system, we propose a novel two-phase framework to provide proactive guidance. The first phase, Goal-adaptive Supervised Fine-Tuning (G-SFT), employs a goal adaptation agent that dynamically adapts to user goal shifts and provides goal-relevant contextual information. G-SFT also incorporates scalable knowledge transfer to distill insights from LLMs into a lightweight model for real-time interaction. The second phase, Click-oriented Reinforcement Learning (C-RL), adopts a generate-rank paradigm, systematically constructs preference pairs from user click signals, and proactively improves click-through rates through more engaging guidance. This dual-phase architecture achieves complementary objectives: G-SFT ensures accurate goal tracking, while C-RL optimizes interaction quality through click signal-driven reinforcement learning. Extensive experiments demonstrate that our framework achieves 86.10% accuracy in offline evaluation (+23.95% over baseline) and 25.28% CTR in online deployment (149.06% relative improvement), while reducing inference latency by 69.55% through scalable knowledge distillation.

pdf bib abs
Decoding LLM Personality Measurement: Forced-Choice vs. Likert
Xiaoyu Li | Haoran Shi | Zengyi Yu | Yukun Tu | Chanjin Zheng
Findings of the Association for Computational Linguistics: ACL 2025

Recent research has focused on investigating the psychological characteristics of Large Language Models (LLMs), emphasizing the importance of comprehending their behavioral traits. Likert scale personality questionnaires have become the primary tool for assessing these characteristics in LLMs. However, such scales can be skewed by factors such as social desirability, distorting the assessment of true personality traits. To address this issue, we firstly incorporate the forced-choice test, a method known for reducing response bias in human personality assessments, into the evaluation of LLM. Specifically, we evaluated six LLMs: Llama-3.1-8B, GLM-4-9B, GPT-3.5-turbo, GPT-4o, Claude-3.5-sonnet, and Deepseek-V3. We compared the Likert scale and forced-choice test results for LLMs’ Big Five personality scores, as well as their reliability. In addition, we looked at how temperature parameter and language affected LLM personality scores. The results show that the forced-choice test better captures differences between LLMs across various personality dimensions and is less influenced by temperature parameters. Furthermore, we found both broad trends and specific variations in personality scores across models and languages.

2024

pdf bib abs
Probing Large Language Models from a Human Behavioral Perspective
Xintong Wang | Xiaoyu Li | Xingshan Li | Chris Biemann
Proceedings of the Workshop: Bridging Neurons and Symbols for Natural Language Processing and Knowledge Graphs Reasoning (NeusymBridge) @ LREC-COLING-2024

Large Language Models (LLMs) have emerged as dominant foundational models in modern NLP. However, the understanding of their prediction processes and internal mechanisms, such as feed-forward networks (FFN) and multi-head self-attention (MHSA), remains largely unexplored. In this work, we probe LLMs from a human behavioral perspective, correlating values from LLMs with eye-tracking measures, which are widely recognized as meaningful indicators of human reading patterns. Our findings reveal that LLMs exhibit a similar prediction pattern with humans but distinct from that of Shallow Language Models (SLMs). Moreover, with the escalation of LLM layers from the middle layers, the correlation coefficients also increase in FFN and MHSA, indicating that the logits within FFN increasingly encapsulate word semantics suitable for predicting tokens from the vocabulary.

2023

Event Causality Extraction (ECE) aims to extract the cause-effect event pairs from the given text, which requires the model to possess a strong reasoning ability to capture event causalities. However, existing works have not adequately exploited the interactions between the cause and effect event that could provide crucial clues for causality reasoning. To this end, we propose an Implicit Cause-Effect interaction (ICE) framework, which formulates ECE as a template-based conditional generation problem. The proposed method captures the implicit intra- and inter-event interactions by incorporating the privileged information (ground truth event types and arguments) for reasoning, and a knowledge distillation mechanism is introduced to alleviate the unavailability of privileged information in the test stage. Furthermore, to facilitate knowledge transfer from teacher to student, we design an event-level alignment strategy named Cause-Effect Optimal Transport (CEOT) to strengthen the semantic interactions of cause-effect event types and arguments. Experimental results indicate that ICE achieves state-of-the-art performance on the ECE-CCKS dataset.

To create a captivating story, a writer often plans a sequence of logically coherent events and ingeniously manipulates the narrative order to generate flashback in place. However, existing storytelling systems suffer from both insufficient understanding of event correlations and inadequate awareness of event temporal order (e.g., go to hospital <after> get ill), making it challenging to generate high-quality events that balance the logic and narrative order of story. In this paper, we propose a narrative order aware framework BPOT (Bidirectional Pretraining Model with Optimal Transport Reward) for story generation, which presents a bidirectional pretrained model to encode event correlations and pairwise event order. We also design a reinforcement learning algorithm with novel optimal transport reward to further improve the quality of generated events in the fine-tuning stage. Specifically, a narrative order aware event sequence model is pretrained with the joint learning objectives of event blank infilling and pairwise order prediction. Then, reinforcement learning with novel optimal transport reward is designed to further improve the generated event quality in the fine-tuning stage. The novel optimal transport reward captures the mappings between the generated events and the sentences in the story, effectively measuring the quality of generated events. Both automatic and manual evaluation results demonstrate the superiority of our framework in generating logically coherent stories with flashbacks.

Xiaoyu Li

2025

2024

2023

2015

Co-authors

Venues