@inproceedings{moller-etal-2021-germanquad,
    title = "{G}erman{Q}u{AD} and {G}erman{DPR}: Improving Non-{E}nglish Question Answering and Passage Retrieval",
    author = {M{\"o}ller, Timo  and
      Risch, Julian  and
      Pietsch, Malte},
    editor = "Fisch, Adam  and
      Talmor, Alon  and
      Chen, Danqi  and
      Choi, Eunsol  and
      Seo, Minjoon  and
      Lewis, Patrick  and
      Jia, Robin  and
      Min, Sewon",
    booktitle = "Proceedings of the 3rd Workshop on Machine Reading for Question Answering",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2021.mrqa-1.4/",
    doi = "10.18653/v1/2021.mrqa-1.4",
    pages = "42--50",
    abstract = "A major challenge of research on non-English machine reading for question answering (QA) is the lack of annotated datasets. In this paper, we present GermanQuAD, a dataset of 13,722 extractive question/answer pairs. To improve the reproducibility of the dataset creation approach and foster QA research on other languages, we summarize lessons learned and evaluate reformulation of question/answer pairs as a way to speed up the annotation process. An extractive QA model trained on GermanQuAD significantly outperforms multilingual models and also shows that machine-translated training data cannot fully substitute hand-annotated training data in the target language. Finally, we demonstrate the wide range of applications of GermanQuAD by adapting it to GermanDPR, a training dataset for dense passage retrieval (DPR), and train and evaluate one of the first non-English DPR models."
}Markdown (Informal)
[GermanQuAD and GermanDPR: Improving Non-English Question Answering and Passage Retrieval](https://preview.aclanthology.org/ingest-emnlp/2021.mrqa-1.4/) (Möller et al., MRQA 2021)
ACL