NusaDialogue: Dialogue Summarization and Generation for Underrepresented and Extremely Low-Resource Languages
Ayu Purwarianti, Dea Adhista, Agung Baptiso, Miftahul Mahfuzh, Yusrina Sabila, Aulia Adila, Samuel Cahyawijaya, Alham Fikri Aji
Abstract
Developing dialogue summarization for extremely low-resource languages is a challenging task. We introduce NusaDialogue, a dialogue summarization dataset for three underrepresented languages in the Malayo-Polynesian language family: Minangkabau, Balinese, and Buginese. NusaDialogue covers 17 topics and 185 subtopics, with annotations provided by 73 native speakers. Additionally, we conducted experiments using fine-tuning on a specifically designed medium-sized language model for Indonesian, as well as zero- and few-shot learning on various multilingual large language models (LLMs). The results indicate that, for extremely low-resource languages such as Minangkabau, Balinese, and Buginese, the fine-tuning approach yields significantly higher performance compared to zero- and few-shot prompting, even when applied to LLMs with considerably larger parameter sizes.- Anthology ID:
- 2025.sealp-1.8
- Volume:
- Proceedings of the Second Workshop in South East Asian Language Processing
- Month:
- January
- Year:
- 2025
- Address:
- Online
- Editors:
- Derry Wijaya, Alham Fikri Aji, Clara Vania, Genta Indra Winata, Ayu Purwarianti
- Venues:
- sealp | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 82–100
- Language:
- URL:
- https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.sealp-1.8/
- DOI:
- Cite (ACL):
- Ayu Purwarianti, Dea Adhista, Agung Baptiso, Miftahul Mahfuzh, Yusrina Sabila, Aulia Adila, Samuel Cahyawijaya, and Alham Fikri Aji. 2025. NusaDialogue: Dialogue Summarization and Generation for Underrepresented and Extremely Low-Resource Languages. In Proceedings of the Second Workshop in South East Asian Language Processing, pages 82–100, Online. Association for Computational Linguistics.
- Cite (Informal):
- NusaDialogue: Dialogue Summarization and Generation for Underrepresented and Extremely Low-Resource Languages (Purwarianti et al., sealp 2025)
- PDF:
- https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.sealp-1.8.pdf