Optimizing Cost-Efficiency with LLM-Generated Training Data for Conversational Semantic Frame Analysis

Shiho Matta, Yin Jou Huang, Fei Cheng, Hirokazu Kiyomaru, Yugo Murawaki


Abstract
Recent studies have shown that few-shot learning enables large language models (LLMs) to generate training data for supervised models at a low cost. However, for complex tasks, the quality of LLM-generated data often falls short compared to human-labeled data. This presents a critical challenge: how should one balance the trade-off between the higher quality but more expensive human-annotated data and the lower quality yet significantly cheaper LLM-generated data? In this paper, we tackle this question for a demanding task: conversational semantic frame analysis (SFA). To address this, we propose a novel method for synthesizing training data tailored to this complex task. Through experiments conducted across a wide range of budget levels, we find that smaller budgets favor a higher reliance on LLM-generated data to achieve optimal cost-efficiency.
Anthology ID:
2025.latechclfl-1.21
Volume:
Proceedings of the 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2025)
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Anna Kazantseva, Stan Szpakowicz, Stefania Degaetano-Ortlieb, Yuri Bizzoni, Janis Pagel
Venues:
LaTeCHCLfL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
238–251
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.latechclfl-1.21/
DOI:
Bibkey:
Cite (ACL):
Shiho Matta, Yin Jou Huang, Fei Cheng, Hirokazu Kiyomaru, and Yugo Murawaki. 2025. Optimizing Cost-Efficiency with LLM-Generated Training Data for Conversational Semantic Frame Analysis. In Proceedings of the 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2025), pages 238–251, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Optimizing Cost-Efficiency with LLM-Generated Training Data for Conversational Semantic Frame Analysis (Matta et al., LaTeCHCLfL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.latechclfl-1.21.pdf