CodRED: A Cross-Document Relation Extraction Dataset for Acquiring Knowledge in the Wild
Yuan Yao, Jiaju Du, Yankai Lin, Peng Li, Zhiyuan Liu, Jie Zhou, Maosong Sun
Abstract
Existing relation extraction (RE) methods typically focus on extracting relational facts between entity pairs within single sentences or documents. However, a large quantity of relational facts in knowledge bases can only be inferred across documents in practice. In this work, we present the problem of cross-document RE, making an initial step towards knowledge acquisition in the wild. To facilitate the research, we construct the first human-annotated cross-document RE dataset CodRED. Compared to existing RE datasets, CodRED presents two key challenges: Given two entities, (1) it requires finding the relevant documents that can provide clues for identifying their relations; (2) it requires reasoning over multiple documents to extract the relational facts. We conduct comprehensive experiments to show that CodRED is challenging to existing RE methods including strong BERT-based models.- Anthology ID:
- 2021.emnlp-main.366
- Volume:
- Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4452–4472
- Language:
- URL:
- https://aclanthology.org/2021.emnlp-main.366
- DOI:
- 10.18653/v1/2021.emnlp-main.366
- Cite (ACL):
- Yuan Yao, Jiaju Du, Yankai Lin, Peng Li, Zhiyuan Liu, Jie Zhou, and Maosong Sun. 2021. CodRED: A Cross-Document Relation Extraction Dataset for Acquiring Knowledge in the Wild. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4452–4472, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- CodRED: A Cross-Document Relation Extraction Dataset for Acquiring Knowledge in the Wild (Yao et al., EMNLP 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.emnlp-main.366.pdf
- Code
- thunlp/codred
- Data
- BC5CDR, DocRED, FewRel, KnowledgeNet