A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization

Fajri Koto


Abstract
In this paper we report our effort to construct the first ever Indonesian corpora for chat summarization. Specifically, we utilized documents of multi-participant chat from a well known online instant messaging application, WhatsApp. We construct the gold standard by asking three native speakers to manually summarize 300 chat sections (152 of them contain images). As result, three reference summaries in extractive and either abstractive form are produced for each chat sections. The corpus is still in its early stage of investigation, yielding exciting possibilities of future works.
Anthology ID:
L16-1129
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
801–805
Language:
URL:
https://aclanthology.org/L16-1129
DOI:
Bibkey:
Cite (ACL):
Fajri Koto. 2016. A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 801–805, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization (Koto, LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/L16-1129.pdf