Optimizing Cost-Efficiency with LLM-Generated Training Data for Conversational Semantic Frame Analysis

Shiho Matta; Yin Jou Huang; Fei Cheng; Hirokazu Kiyomaru; Yugo Murawaki

Optimizing Cost-Efficiency with LLM-Generated Training Data for Conversational Semantic Frame Analysis

Shiho Matta, Yin Jou Huang, Fei Cheng, Hirokazu Kiyomaru, Yugo Murawaki

Abstract

Recent studies have shown that few-shot learning enables large language models (LLMs) to generate training data for supervised models at a low cost. However, for complex tasks, the quality of LLM-generated data often falls short compared to human-labeled data. This presents a critical challenge: how should one balance the trade-off between the higher quality but more expensive human-annotated data and the lower quality yet significantly cheaper LLM-generated data? In this paper, we tackle this question for a demanding task: conversational semantic frame analysis (SFA). To address this, we propose a novel method for synthesizing training data tailored to this complex task. Through experiments conducted across a wide range of budget levels, we find that smaller budgets favor a higher reliance on LLM-generated data to achieve optimal cost-efficiency.

Anthology ID:: 2025.latechclfl-1.21
Volume:: Proceedings of the 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2025)
Month:: May
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Anna Kazantseva, Stan Szpakowicz, Stefania Degaetano-Ortlieb, Yuri Bizzoni, Janis Pagel
Venues:: LaTeCHCLfL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 238–251
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.latechclfl-1.21/
DOI:
Bibkey:
Cite (ACL):: Shiho Matta, Yin Jou Huang, Fei Cheng, Hirokazu Kiyomaru, and Yugo Murawaki. 2025. Optimizing Cost-Efficiency with LLM-Generated Training Data for Conversational Semantic Frame Analysis. In Proceedings of the 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2025), pages 238–251, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Optimizing Cost-Efficiency with LLM-Generated Training Data for Conversational Semantic Frame Analysis (Matta et al., LaTeCHCLfL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.latechclfl-1.21.pdf

PDF Cite Search Fix data