Abstract
A multi-document summarizer finds the key topics from multiple textual sources and organizes information around them. In this paper we propose a summarization method for Persian text using paragraph vectors that can represent textual units of arbitrary lengths. We use these vectors to calculate the semantic relatedness between documents, cluster them to a number of predetermined groups, weight them based on their distance to the centroids and the intra-cluster homogeneity and take out the key paragraphs. We compare the final summaries with the gold-standard summaries of 21 digital topics using the ROUGE evaluation metric. Experimental results show the advantages of using paragraph vectors over earlier attempts at developing similar methods for a low resource language like Persian.- Anthology ID:
- R17-2005
- Volume:
- Proceedings of the Student Research Workshop Associated with RANLP 2017
- Month:
- September
- Year:
- 2017
- Address:
- Varna
- Editors:
- Venelin Kovatchev, Irina Temnikova, Pepa Gencheva, Yasen Kiprov, Ivelina Nikolova
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 35–40
- Language:
- URL:
- https://doi.org/10.26615/issn.1314-9156.2017_005
- DOI:
- 10.26615/issn.1314-9156.2017_005
- Cite (ACL):
- Morteza Rohanian. 2017. Multi-Document Summarization of Persian Text using Paragraph Vectors. In Proceedings of the Student Research Workshop Associated with RANLP 2017, pages 35–40, Varna. INCOMA Ltd..
- Cite (Informal):
- Multi-Document Summarization of Persian Text using Paragraph Vectors (Rohanian, RANLP 2017)
- PDF:
- https://doi.org/10.26615/issn.1314-9156.2017_005