Timo Johner

2021

pdf abs
Error Analysis of using BART for Multi-Document Summarization: A Study for English and German Language
Timo Johner | Abhik Jana | Chris Biemann
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

Recent research using pre-trained language models for multi-document summarization task lacks deep investigation of potential erroneous cases and their possible application on other languages. In this work, we apply a pre-trained language model (BART) for multi-document summarization (MDS) task using both fine-tuning and without fine-tuning. We use two English datasets and one German dataset for this study. First, we reproduce the multi-document summaries for English language by following one of the recent studies. Next, we show the applicability of the model to German language by achieving state-of-the-art performance on German MDS. We perform an in-depth error analysis of the followed approach for both languages, which leads us to identifying most notable errors, from made-up facts and topic delimitation, and quantifying the amount of extractiveness.

Co-authors

Abhik Jana 1
Chris Biemann 1

Venues

nodalida1