Two-Stage Movie Script Summarization: An Efficient Method For Low-Resource Long Document Summarization

Dongqi Pu, Xudong Hong, Pin-Jie Lin, Ernie Chang, Vera Demberg


Abstract
The Creative Summarization Shared Task at COLING 2022 aspires to generate summaries given long-form texts from creative writing. This paper presents the system architecture and the results of our participation in the Scriptbase track that focuses on generating movie plots given movie scripts. The core innovation in our model employs a two-stage hierarchical architecture for movie script summarization. In the first stage, a heuristic extraction method is applied to extract actions and essential dialogues, which reduces the average length of input movie scripts by 66% from about 24K to 8K tokens. In the second stage, a state-of-the-art encoder-decoder model, Longformer-Encoder-Decoder (LED), is trained with effective fine-tuning methods, BitFit and NoisyTune. Evaluations on the unseen test set indicate that our system outperforms both zero-shot LED baselines as well as other participants on various automatic metrics and ranks 1st in the Scriptbase track.
Anthology ID:
2022.creativesumm-1.9
Volume:
Proceedings of The Workshop on Automatic Summarization for Creative Writing
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
CreativeSumm
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
57–66
Language:
URL:
https://aclanthology.org/2022.creativesumm-1.9
DOI:
Bibkey:
Cite (ACL):
Dongqi Pu, Xudong Hong, Pin-Jie Lin, Ernie Chang, and Vera Demberg. 2022. Two-Stage Movie Script Summarization: An Efficient Method For Low-Resource Long Document Summarization. In Proceedings of The Workshop on Automatic Summarization for Creative Writing, pages 57–66, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
Two-Stage Movie Script Summarization: An Efficient Method For Low-Resource Long Document Summarization (Pu et al., CreativeSumm 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.creativesumm-1.9.pdf
Data
LRA