Abstract
Recent research using pre-trained language models for multi-document summarization task lacks deep investigation of potential erroneous cases and their possible application on other languages. In this work, we apply a pre-trained language model (BART) for multi-document summarization (MDS) task using both fine-tuning and without fine-tuning. We use two English datasets and one German dataset for this study. First, we reproduce the multi-document summaries for English language by following one of the recent studies. Next, we show the applicability of the model to German language by achieving state-of-the-art performance on German MDS. We perform an in-depth error analysis of the followed approach for both languages, which leads us to identifying most notable errors, from made-up facts and topic delimitation, and quantifying the amount of extractiveness.- Anthology ID:
- 2021.nodalida-main.43
- Volume:
- Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
- Month:
- May 31--2 June
- Year:
- 2021
- Address:
- Reykjavik, Iceland (Online)
- Editors:
- Simon Dobnik, Lilja Øvrelid
- Venue:
- NoDaLiDa
- SIG:
- Publisher:
- Linköping University Electronic Press, Sweden
- Note:
- Pages:
- 391–397
- Language:
- URL:
- https://aclanthology.org/2021.nodalida-main.43
- DOI:
- Cite (ACL):
- Timo Johner, Abhik Jana, and Chris Biemann. 2021. Error Analysis of using BART for Multi-Document Summarization: A Study for English and German Language. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 391–397, Reykjavik, Iceland (Online). Linköping University Electronic Press, Sweden.
- Cite (Informal):
- Error Analysis of using BART for Multi-Document Summarization: A Study for English and German Language (Johner et al., NoDaLiDa 2021)
- PDF:
- https://preview.aclanthology.org/ml4al-ingestion/2021.nodalida-main.43.pdf
- Code
- uhh-lt/multi-summ-german
- Data
- CNN/Daily Mail