SCITUNE: Aligning Large Language Models with Human-Curated Scientific Multimodal Instructions

Sameera Horawalavithana, Sai Munikoti, Ian Stewart, Henry Kvinge, Karl Pazdernik


Abstract
Instruction finetuning is a popular paradigm to align large language models (LLM) with human intent. Despite its popularity, this idea is less explored in improving LLMs to align existing foundation models with scientific disciplines, concepts and goals. In this work, we present SciTune as a tuning framework to improve the ability of LLMs to follow multimodal instructions generated from scientific publications. To test our methodology, we train a large multimodal model LLaMA-SciTune that connects a vision encoder and LLM for science-focused visual and language understanding. LLaMA-SciTune significantly outperforms the state-of-the-art models in the generated figure types and captions in SciCap and VisText benchmarks. In comparison to the models that are finetuned with synthetic data only, LLaMA-SciTune surpasses human performance on average and in many sub-categories on the ScienceQA benchmark. Our results demonstrate that human-generated scientific multimodal instructions remain highly valuable in tuning LLMs to perform well on science tasks, despite their lower volume and relative scarcity compared to synthetic data.
Anthology ID:
2024.nlp4science-1.7
Volume:
Proceedings of the 1st Workshop on NLP for Science (NLP4Science)
Month:
November
Year:
2024
Address:
Miami, FL, USA
Editors:
Lotem Peled-Cohen, Nitay Calderon, Shir Lissak, Roi Reichart
Venue:
NLP4Science
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
58–72
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.nlp4science-1.7/
DOI:
10.18653/v1/2024.nlp4science-1.7
Bibkey:
Cite (ACL):
Sameera Horawalavithana, Sai Munikoti, Ian Stewart, Henry Kvinge, and Karl Pazdernik. 2024. SCITUNE: Aligning Large Language Models with Human-Curated Scientific Multimodal Instructions. In Proceedings of the 1st Workshop on NLP for Science (NLP4Science), pages 58–72, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):
SCITUNE: Aligning Large Language Models with Human-Curated Scientific Multimodal Instructions (Horawalavithana et al., NLP4Science 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.nlp4science-1.7.pdf