An Xiao


2024

pdf
CSLM: A Framework for Question Answering Dataset Generation through Collaborative Small Language Models
Yiming Wang | Yang Liu | Lingchen Wang | An Xiao
Findings of the Association for Computational Linguistics: EMNLP 2024

Collecting high-quality question-answer (QA) pairs is vital for the training of large language models (LLMs), yet this process is traditionally laborious and time-intensive. With the rapid evolution of LLMs, the potential for leveraging these models to autonomously generate QA pairs has become apparent, particularly through the use of large-scale models like GPT-4. However, the computational demands and associated costs often render such approaches prohibitive for the average researcher. Addressing this gap, we introduce the Collaborative Small Language Model Framework (CSLM), an innovative solution that combines a group of small-scaled, open-source LLMs to collaboratively produce QA pairs. Experiments on datasets of various domains show that CSLM unleashes the full potential of diverse small models to generate high-quality QA pairs, making it accessible to a broader range of researchers.