Abstractive Text Summarization with Application to Bulgarian News Articles

Nikola Taushanov, Ivan Koychev, Preslav Nakov


Abstract
With the development of the Internet, a huge amount of information is available every day. Therefore, text summarization has become critical part of our first access to the information. There are two major approaches for automatic text summarization: abstractive and extractive. In this work, we apply abstractive summarization algorithms on a corpus of Bulgarian news articles. In particular, we compare selected algorithms of both techniques and we show results which provide evidence that the selected state-of-the-art algorithms for abstractive text summarization perform better than the extractive ones for articles in Bulgarian. For the purpose of our experiments we collected a new dataset consisting of around 70,000 news articles and their topics. For research purposes we are also sharing the tools to easily collect and process such datasets.
Anthology ID:
2018.clib-1.4
Volume:
Proceedings of the Third International Conference on Computational Linguistics in Bulgaria (CLIB 2018)
Month:
May
Year:
2018
Address:
Sofia, Bulgaria
Venue:
CLIB
SIG:
Publisher:
Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences
Note:
Pages:
15–22
Language:
URL:
https://preview.aclanthology.org/icon-24-ingestion/2018.clib-1.4/
DOI:
Bibkey:
Cite (ACL):
Nikola Taushanov, Ivan Koychev, and Preslav Nakov. 2018. Abstractive Text Summarization with Application to Bulgarian News Articles. In Proceedings of the Third International Conference on Computational Linguistics in Bulgaria (CLIB 2018), pages 15–22, Sofia, Bulgaria. Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences.
Cite (Informal):
Abstractive Text Summarization with Application to Bulgarian News Articles (Taushanov et al., CLIB 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/icon-24-ingestion/2018.clib-1.4.pdf