Relation Extraction across Entire Books to Reconstruct Community Networks: The AffilKG Datasets

Erica Cai, Sean Mcquade, Kevin Young, Brendan O'Connor


Abstract
When knowledge graphs (KGs) are automatically extracted from text, are they accurate enough for downstream analysis? Unfortunately, current annotated datasets cannot be used to evaluate this question, since the knowledge graphs they correspond to, constructed by mapping entities in the text to nodes and relations to edges, are typically highly disconnected, too small, or overly complex. To address this gap, we introduce AffilKG, which is a collection of six datasets that are the first to pair complete book scans with large, labeled knowledge graphs. Each dataset features affiliation graphs, which are simple KGs that capture Member relationships between Person and Organization entities—useful in studies of migration, community interactions, and other social phenomena. In addition, three datasets include expanded KGs with a wider variety of relation types. Our preliminary experiments demonstrate significant variability in model performance across datasets, underscoring AffilKG’s ability to enable two critical advances: (1) benchmarking how extraction errors propagate to graph-level analyses (e.g., community structure), and (2) validating KG extraction methods for real-world social science research.
Anthology ID:
2026.lrec-main.615
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
7744–7754
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.615/
DOI:
Bibkey:
Cite (ACL):
Erica Cai, Sean Mcquade, Kevin Young, and Brendan O'Connor. 2026. Relation Extraction across Entire Books to Reconstruct Community Networks: The AffilKG Datasets. International Conference on Language Resources and Evaluation, main:7744–7754.
Cite (Informal):
Relation Extraction across Entire Books to Reconstruct Community Networks: The AffilKG Datasets (Cai et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.615.pdf