Augmented Bio-SBERT: Improving Performance for Pairwise Sentence Tasks in Bio-medical Domain

Sonam Pankaj, Amit Gautam


Abstract
One of the modern challenges in AI is the access to high-quality and annotated data, especially in NLP; that is why augmentation is gaining importance. In computer vision, where image data augmentation is standard, text data augmentation in NLP is complex due to the high complexity of language. Moreover, we have seen the advantages of augmentation where there are fewer data available, which can significantly improve the model’s accuracy and performance. We have implemented Augmentation in Pairwise sentence scoring in the biomedical domain. By experimenting with our approach to downstream tasks on biomedical data, we have looked into the solution to improve Bi-encoders’ sentence transformer performance using an augmented dataset generated by cross-encoders fine-tuned on Biosses and MedNLI on the pre-trained Bio-BERT model. It has significantly improved the results with respect to the model only trained on Gold data for the respective tasks.
Anthology ID:
2022.loresmt-1.6
Volume:
Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2022)
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
LoResMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
43–47
Language:
URL:
https://aclanthology.org/2022.loresmt-1.6
DOI:
Bibkey:
Cite (ACL):
Sonam Pankaj and Amit Gautam. 2022. Augmented Bio-SBERT: Improving Performance for Pairwise Sentence Tasks in Bio-medical Domain. In Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2022), pages 43–47, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
Augmented Bio-SBERT: Improving Performance for Pairwise Sentence Tasks in Bio-medical Domain (Pankaj & Gautam, LoResMT 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.loresmt-1.6.pdf
Optional supplementary material:
 2022.loresmt-1.6.OptionalSupplementaryMaterial.zip
Data
BIOSSESBLUE