From In-Distribution to Out-of-Distribution: Joint Loss for Improving Generalization in Software Mention and Relation Extraction

Stasa Mandic, Georg Niess, Roman Kern


Abstract
Identifying software entities and their semantic relations in scientific texts is key for reproducibility and machine-readable knowledge graphs, yet models struggle with domain variability and sparse supervision. We address this by evaluating joint Named Entity Recognition (NER) and Relation Extraction (RE) models on the SOMD 2025 shared task, emphasizing generalization to out-of-domain scholarly texts. We propose a unified training objective that jointly optimizes both tasks using a shared loss function and demonstrates that joint loss formulations can improve out-of-domain robustness compared to disjoint training. Our results reveal significant performance gaps between in- and out-of-domain settings, prompting critical reflections on modeling strategies for software knowledge extraction. Notably, our approach ranked 1st in Phase 2 (out-of-distribution) and 2nd in Phase 1 (in-distribution) in the SOMD 2025 shared task, showing strong generalization and robust performance across domains.
Anthology ID:
2025.sdp-1.14
Volume:
Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Tirthankar Ghosal, Philipp Mayr, Amanpreet Singh, Aakanksha Naik, Georg Rehm, Dayne Freitag, Dan Li, Sonja Schimmler, Anita De Waard
Venues:
sdp | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
146–153
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.sdp-1.14/
DOI:
10.18653/v1/2025.sdp-1.14
Bibkey:
Cite (ACL):
Stasa Mandic, Georg Niess, and Roman Kern. 2025. From In-Distribution to Out-of-Distribution: Joint Loss for Improving Generalization in Software Mention and Relation Extraction. In Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025), pages 146–153, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
From In-Distribution to Out-of-Distribution: Joint Loss for Improving Generalization in Software Mention and Relation Extraction (Mandic et al., sdp 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.sdp-1.14.pdf