Speaker Identification and Dataset Construction Using LLMs: A Case Study on Japanese Narratives

Seiji Gobara; Hidetaka Kamigaito; Taro Watanabe

Speaker Identification and Dataset Construction Using LLMs: A Case Study on Japanese Narratives

Seiji Gobara, Hidetaka Kamigaito, Taro Watanabe

Abstract

Speaker identification in narrative analysis is a challenging task due to complex dialogues, diverse utterance patterns, and ambiguous character references. Cosly and time-intensive manual annotation limits the scalability of high-quality dataset creation.This study demonstrates a cost-efficient approach of constructing speaker identification datasets by combining small-scale manual annotation with LLM-based labeling. A subset of data is manually annotated and is used to guide LLM predictions with a few-shot approach followed by refinement through minimal human corrections. Our results show that LLMs achieve approximately 90% accuracy on challenging narratives, such as the “Three Kingdoms” dataset, underscoring the importance of targeted human corrections. This approach proves effective for constructing scalable and cost-efficient datasets for Japanese and complex narratives.

Anthology ID:: 2025.wnu-1.17
Volume:: Proceedings of the The 7th Workshop on Narrative Understanding
Month:: May
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Elizabeth Clark, Yash Kumar Lal, Snigdha Chaturvedi, Mohit Iyyer, Anneliese Brei, Ashutosh Modi, Khyathi Raghavi Chandu
Venues:: WNU | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 97–119
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.wnu-1.17/
DOI:
Bibkey:
Cite (ACL):: Seiji Gobara, Hidetaka Kamigaito, and Taro Watanabe. 2025. Speaker Identification and Dataset Construction Using LLMs: A Case Study on Japanese Narratives. In Proceedings of the The 7th Workshop on Narrative Understanding, pages 97–119, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Speaker Identification and Dataset Construction Using LLMs: A Case Study on Japanese Narratives (Gobara et al., WNU 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.wnu-1.17.pdf

PDF Cite Search Fix data