Speaker Identification and Dataset Construction Using LLMs: A Case Study on Japanese Narratives

Seiji Gobara, Hidetaka Kamigaito, Taro Watanabe


Abstract
Speaker identification in narrative analysis is a challenging task due to complex dialogues, diverse utterance patterns, and ambiguous character references. Cosly and time-intensive manual annotation limits the scalability of high-quality dataset creation.This study demonstrates a cost-efficient approach of constructing speaker identification datasets by combining small-scale manual annotation with LLM-based labeling. A subset of data is manually annotated and is used to guide LLM predictions with a few-shot approach followed by refinement through minimal human corrections. Our results show that LLMs achieve approximately 90% accuracy on challenging narratives, such as the “Three Kingdoms” dataset, underscoring the importance of targeted human corrections. This approach proves effective for constructing scalable and cost-efficient datasets for Japanese and complex narratives.
Anthology ID:
2025.wnu-1.17
Volume:
Proceedings of the The 7th Workshop on Narrative Understanding
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Elizabeth Clark, Yash Kumar Lal, Snigdha Chaturvedi, Mohit Iyyer, Anneliese Brei, Ashutosh Modi, Khyathi Raghavi Chandu
Venues:
WNU | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
97–119
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.wnu-1.17/
DOI:
Bibkey:
Cite (ACL):
Seiji Gobara, Hidetaka Kamigaito, and Taro Watanabe. 2025. Speaker Identification and Dataset Construction Using LLMs: A Case Study on Japanese Narratives. In Proceedings of the The 7th Workshop on Narrative Understanding, pages 97–119, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Speaker Identification and Dataset Construction Using LLMs: A Case Study on Japanese Narratives (Gobara et al., WNU 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.wnu-1.17.pdf