@inproceedings{frefel-2020-summarization,
    title = "Summarization Corpora of {W}ikipedia Articles",
    author = "Frefel, Dominik",
    editor = "Calzolari, Nicoletta  and
      B{\'e}chet, Fr{\'e}d{\'e}ric  and
      Blache, Philippe  and
      Choukri, Khalid  and
      Cieri, Christopher  and
      Declerck, Thierry  and
      Goggi, Sara  and
      Isahara, Hitoshi  and
      Maegaard, Bente  and
      Mariani, Joseph  and
      Mazo, H{\'e}l{\`e}ne  and
      Moreno, Asuncion  and
      Odijk, Jan  and
      Piperidis, Stelios",
    booktitle = "Proceedings of the Twelfth Language Resources and Evaluation Conference",
    month = may,
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://preview.aclanthology.org/ingest-emnlp/2020.lrec-1.821/",
    pages = "6651--6655",
    language = "eng",
    ISBN = "979-10-95546-34-4",
    abstract = "In this paper we propose a process to extract summarization corpora from Wikipedia articles. Applied to the German language we create a corpus of 240,000 texts. We use ROUGE scores for the extraction and evaluation of our corpus. For this we provide a ROUGE metric implementation adapted to the German language. The extracted corpus is used to train three abstractive summarization models which we compare to different baselines. The resulting summaries sound natural and cover the input text very well. The corpus can be downloaded at \url{https://github.com/domfr/GeWiki}."
}Markdown (Informal)
[Summarization Corpora of Wikipedia Articles](https://preview.aclanthology.org/ingest-emnlp/2020.lrec-1.821/) (Frefel, LREC 2020)
ACL
- Dominik Frefel. 2020. Summarization Corpora of Wikipedia Articles. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6651–6655, Marseille, France. European Language Resources Association.