Multi-Document Summarization of Persian Text using Paragraph Vectors

Morteza Rohanian


Abstract
A multi-document summarizer finds the key topics from multiple textual sources and organizes information around them. In this paper we propose a summarization method for Persian text using paragraph vectors that can represent textual units of arbitrary lengths. We use these vectors to calculate the semantic relatedness between documents, cluster them to a number of predetermined groups, weight them based on their distance to the centroids and the intra-cluster homogeneity and take out the key paragraphs. We compare the final summaries with the gold-standard summaries of 21 digital topics using the ROUGE evaluation metric. Experimental results show the advantages of using paragraph vectors over earlier attempts at developing similar methods for a low resource language like Persian.
Anthology ID:
R17-2005
Volume:
Proceedings of the Student Research Workshop Associated with RANLP 2017
Month:
September
Year:
2017
Address:
Varna
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
35–40
Language:
URL:
https://doi.org/10.26615/issn.1314-9156.2017_005
DOI:
10.26615/issn.1314-9156.2017_005
Bibkey:
Cite (ACL):
Morteza Rohanian. 2017. Multi-Document Summarization of Persian Text using Paragraph Vectors. In Proceedings of the Student Research Workshop Associated with RANLP 2017, pages 35–40, Varna. INCOMA Ltd..
Cite (Informal):
Multi-Document Summarization of Persian Text using Paragraph Vectors (Rohanian, RANLP 2017)
Copy Citation:
PDF:
https://doi.org/10.26615/issn.1314-9156.2017_005