Multi-lingual Wikipedia Summarization and Title Generation On Low Resource Corpus

Wei Liu, Lei Li, Zuying Huang, Yinan Liu


Abstract
MultiLing 2019 Headline Generation Task on Wikipedia Corpus raised a critical and practical problem: multilingual task on low resource corpus. In this paper we proposed QDAS extractive summarization model enhanced by sentence2vec and try to apply transfer learning based on large multilingual pre-trained language model for Wikipedia Headline Generation task. We treat it as sequence labeling task and develop two schemes to handle with it. Experimental results have shown that large pre-trained model can effectively utilize learned knowledge to extract certain phrase using low resource supervised data.
Anthology ID:
W19-8904
Volume:
Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
17–25
Language:
URL:
https://aclanthology.org/W19-8904
DOI:
10.26615/978-954-452-058-8_004
Bibkey:
Cite (ACL):
Wei Liu, Lei Li, Zuying Huang, and Yinan Liu. 2019. Multi-lingual Wikipedia Summarization and Title Generation On Low Resource Corpus. In Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources, pages 17–25, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Multi-lingual Wikipedia Summarization and Title Generation On Low Resource Corpus (Liu et al., RANLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/W19-8904.pdf