SOMD2025: A Challenging Shared Tasks for Software Related Information Extraction

Sharmila Upadhyaya, Wolfgang Otto, Frank Krüger, Stefan Dietze


Abstract
The use of software in acquiring, analyzing, and interpreting research data underscores its role as an essential artifact of scientific inquiry.Understanding and tracing the provenance of software in research helps in reproducible and collaborative research works.In this paper, we present an overview of our second iteration of the Software Mention Detection (SOMD) shared task as a part of the Scholarly Document Processing (SDP) workshop, that will be held in conjunction with ACL in 2025. We intend to foster among participants to brainstorm for optimized software mention detection and additional attributes and relation extraction tasks in the provided gold standard benchmark. Our shared task has two phases of challenges. First, the participants focus on implementing a joint framework for NER and RE for the given dataset. At the same time, the second phase includes the out-of-distribution dataset to evaluate the generalizability of the methods proposed in Phase I. The competition (March-April 2025) attracted 18 participants and spanned two months. Four teams have finished the competition and submitted full system descriptions. Participants applied various approaches, including joint and pipeline models, and explored data augmentation with LLM-generated samples.The evaluation was based on a macro-F1 score for both NER and RE, with the average reported as the SOMD-score.The winning teams achieved a SOMD-score of 0.89 in Phase I and 0.63 in Phase II, demonstrating the challenge of generalization.
Anthology ID:
2025.sdp-1.13
Volume:
Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Tirthankar Ghosal, Philipp Mayr, Amanpreet Singh, Aakanksha Naik, Georg Rehm, Dayne Freitag, Dan Li, Sonja Schimmler, Anita De Waard
Venues:
sdp | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
137–145
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.sdp-1.13/
DOI:
10.18653/v1/2025.sdp-1.13
Bibkey:
Cite (ACL):
Sharmila Upadhyaya, Wolfgang Otto, Frank Krüger, and Stefan Dietze. 2025. SOMD2025: A Challenging Shared Tasks for Software Related Information Extraction. In Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025), pages 137–145, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
SOMD2025: A Challenging Shared Tasks for Software Related Information Extraction (Upadhyaya et al., sdp 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.sdp-1.13.pdf