Graph-TempCZ: A Graph Representation of Software Mentions for Predicting Software Usage in Scientific Publications

Congfeng Cao, Pengyu Zhang, Jelke Bloem


Abstract
Predicting how software is used, shared, and evolves across publications is essential to studying scientific progress. Existing methods for representing software usage in publications rely mainly on tabular or textual formats, which limit their structural expressiveness and consequently their ability to predict software usage. We address these gaps by representing software mentions and citations as a graph and formulating software usage prediction as a link prediction task. To support this study, we construct the first large-scale graph dataset of publication and software mentions, Graph-TempCZ, covering 1959-2022 with over six million mention relationships. Experiments using both traditional machine learning and Graph Neural Network (GNN) show that graph-based models substantially outperform feature-based baselines, achieving a 5.98% improvement in test accuracy. Temporal experiments further reveal that models trained on one year generalize effectively to nearby years but show gradual performance decay as the temporal gap increases. This work provides the first comprehensive foundation for analyzing software usage through a temporal graph representation.
Anthology ID:
2026.lrec-main.619
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
7791–7803
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.619/
DOI:
Bibkey:
Cite (ACL):
Congfeng Cao, Pengyu Zhang, and Jelke Bloem. 2026. Graph-TempCZ: A Graph Representation of Software Mentions for Predicting Software Usage in Scientific Publications. International Conference on Language Resources and Evaluation, main:7791–7803.
Cite (Informal):
Graph-TempCZ: A Graph Representation of Software Mentions for Predicting Software Usage in Scientific Publications (Cao et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.619.pdf