Jui-I Wang

2026

BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation
Tsung-Min Pai | Jui-I Wang | Li-Chun Lu | Shao-Hua Sun | Hung-yi Lee | Kai-Wei Chang
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Multi-LLM systems enhance the creativity of large language models by simulating human collective intelligence but suffer from significant drawbacks, such as high computational costs and inference latency. To address these limitations, we propose BILLY (BlendIng persona vectors for Large Language model creativitY), a training-free framework that captures the benefits of multi-LLM collaboration, i.e. inducing diverse perspectives and specialized expertise, within a single model. BILLY operates by extracting and blending multiple distinct persona vectors directly in the model’s activation space. We steer the model’s generation process with this merged vector while inference, enabling multi-perspective output without explicit multi-LLM communication. Our experiments across creativity-oriented benchmarks demonstrate that BILLY surpasses single model prompting and traditional multi-LLM approaches, while substantially reducing inference time and computational costs. Our analyses further reveal that distinct persona vectors can be blended to achieve both effective control over complementary aspects of generation and greater interpretability.

2025

pdf bib abs

MESAQA: A Dataset for Multi-Span Contextual and Evidence-Grounded Question Answering
Jui-I Wang | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 31st International Conference on Computational Linguistics

We introduce MESAQA, a novel dataset focusing on multi-span contextual understanding question answering (QA).Unlike traditional single-span QA systems, questions in our dataset consider information from multiple spans within the context document. MESAQA supports evidence-grounded QA, demanding the model’s capability of answer generation and multi-evidence identification. Our automated dataset creation method leverages the MASH-QA dataset and large language models (LLMs) to ensure that each Q/A pair requires considering all selected spans. Experimental results show that current models struggle with multi-span contextual QA, underscoring the need for new approaches. Our dataset sets a benchmark for this emerging QA paradigm, promoting research in complex information retrieval and synthesis.

Co-authors

Tsung-Min Pai 1

Shao-Hua Sun 1

Venues

COLING1
EACL1

Fix author