Optimizing Cost-Efficiency with LLM-Generated Training Data for Conversational Semantic Frame Analysis
Shiho Matta, Yin Jou Huang, Fei Cheng, Hirokazu Kiyomaru, Yugo Murawaki
Abstract
Recent studies have shown that few-shot learning enables large language models (LLMs) to generate training data for supervised models at a low cost. However, for complex tasks, the quality of LLM-generated data often falls short compared to human-labeled data. This presents a critical challenge: how should one balance the trade-off between the higher quality but more expensive human-annotated data and the lower quality yet significantly cheaper LLM-generated data? In this paper, we tackle this question for a demanding task: conversational semantic frame analysis (SFA). To address this, we propose a novel method for synthesizing training data tailored to this complex task. Through experiments conducted across a wide range of budget levels, we find that smaller budgets favor a higher reliance on LLM-generated data to achieve optimal cost-efficiency.- Anthology ID:
- 2025.latechclfl-1.21
- Volume:
- Proceedings of the 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2025)
- Month:
- May
- Year:
- 2025
- Address:
- Albuquerque, New Mexico
- Editors:
- Anna Kazantseva, Stan Szpakowicz, Stefania Degaetano-Ortlieb, Yuri Bizzoni, Janis Pagel
- Venues:
- LaTeCHCLfL | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 238–251
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2025.latechclfl-1.21/
- DOI:
- Cite (ACL):
- Shiho Matta, Yin Jou Huang, Fei Cheng, Hirokazu Kiyomaru, and Yugo Murawaki. 2025. Optimizing Cost-Efficiency with LLM-Generated Training Data for Conversational Semantic Frame Analysis. In Proceedings of the 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2025), pages 238–251, Albuquerque, New Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Optimizing Cost-Efficiency with LLM-Generated Training Data for Conversational Semantic Frame Analysis (Matta et al., LaTeCHCLfL 2025)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2025.latechclfl-1.21.pdf