Abstract
Summarization of multi-party conversation is one of the important tasks in natural language processing. In this paper, we explain a Japanese corpus and a topic segmentation task. To the best of our knowledge, the corpus is the first Japanese corpus annotated for summarization tasks and freely available to anyone. We call it “the Kyutech corpus.” The task of the corpus is a decision-making task with four participants and it contains utterances with time information, topic segmentation and reference summaries. As a case study for the corpus, we describe a method combined with LCSeg and TopicTiling for a topic segmentation task. We discuss the effectiveness and the problems of the combined method through the experiment with the Kyutech corpus.- Anthology ID:
- W16-5412
- Volume:
- Proceedings of the 12th Workshop on Asian Language Resources (ALR12)
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Venue:
- ALR
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 95–104
- Language:
- URL:
- https://aclanthology.org/W16-5412
- DOI:
- Cite (ACL):
- Takashi Yamamura, Kazutaka Shimada, and Shintaro Kawahara. 2016. The Kyutech corpus and topic segmentation using a combined method. In Proceedings of the 12th Workshop on Asian Language Resources (ALR12), pages 95–104, Osaka, Japan. The COLING 2016 Organizing Committee.
- Cite (Informal):
- The Kyutech corpus and topic segmentation using a combined method (Yamamura et al., ALR 2016)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/W16-5412.pdf