Manish Dahal
2025
SOMD 2025: Fine-tuning ModernBERT for In- and Out-of-Distribution NER and Relation Extraction of Software Mentions in Scientific Texts
Vaghawan Ojha
|
Projan Shakya
|
Kristina Ghimire
|
Kashish Bataju
|
Ashwini Mandal
|
Sadikshya Gyawali
|
Manish Dahal
|
Manish Awale
|
Shital Adhikari
|
Sanjay Rijal
Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
Software mentions are ubiquitous yet remains irregularly referenced among scientific texts. In this paper, we utilized the dataset and evaluation criteria defined by SoftwareMention Detection (SOMD 2025) competition to solve the problem of Named Entity Recognition (NER) and Relation Extraction (RE) in input sentences from scientific texts. During the competition, we achieved a leading F1 SOMD score of 0.89 in Phase I by first fine-tuning ModernBERT for NER, and then using the extracted entity pairs for RE. Additionally, we trained a model that jointly optimizes entity and relation losses, leading to an improvement in F1 SOMD score to 0.92. Retraining the same model on an augmented dataset, we achieved the second best F1 SOMD score of 0.55 in Phase II. In the Open Submission phase, we experimented with adapative fine-tuning, achieving an F1 SOMD score of 0.6, with the best macro average for NER being 0.69. Our work shows the efficiency of fine-tuning a niche task like software mention detection despite having limited data and the promise of adaptive fine-tuning on Out of Distribution (OOD) dataset.
Search
Fix author
Co-authors
- Shital Adhikari 1
- Manish Awale 1
- Kashish Bataju 1
- Kristina Ghimire 1
- Sadikshya Gyawali 1
- show all...