TüDuo at SemEval-2024 Task 2: Flan-T5 and Data Augmentation for Biomedical NLI

Veronika Smilga, Hazem Alabiad


Abstract
This paper explores using data augmentation with smaller language models under 3 billion parameters for the SemEval-2024 Task 2 on Biomedical Natural Language Inference for Clinical Trials. We fine-tune models from the Flan-T5 family with and without using augmented data automatically generated by GPT-3.5-Turbo and find that data augmentation through techniques like synonym replacement, syntactic changes, adding random facts, and meaning reversion improves model faithfulness (ability to change predictions for semantically different inputs) and consistency (ability to give same predictions for semantic preserving changes). However, data augmentation tends to decrease performance on the original dataset distribution, as measured by F1 score. Our best system is the Flan-T5 XL model fine-tuned on the original training data combined with over 6,000 augmented examples. The system ranks in the top 10 for all three metrics.
Anthology ID:
2024.semeval-1.106
Volume:
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
737–744
Language:
URL:
https://aclanthology.org/2024.semeval-1.106
DOI:
Bibkey:
Cite (ACL):
Veronika Smilga and Hazem Alabiad. 2024. TüDuo at SemEval-2024 Task 2: Flan-T5 and Data Augmentation for Biomedical NLI. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 737–744, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
TüDuo at SemEval-2024 Task 2: Flan-T5 and Data Augmentation for Biomedical NLI (Smilga & Alabiad, SemEval 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-checklist/2024.semeval-1.106.pdf
Supplementary material:
 2024.semeval-1.106.SupplementaryMaterial.txt
Supplementary material:
 2024.semeval-1.106.SupplementaryMaterial.zip