Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration

Fanqi Wan, Xinting Huang, Tao Yang, Xiaojun Quan, Wei Bi, Shuming Shi


Abstract
Instruction-tuning can be substantially optimized through enhanced diversity, resulting in models capable of handling a broader spectrum of tasks. However, existing data employed for such tuning often exhibit an inadequate coverage of individual domains, limiting the scope for nuanced comprehension and interactions within these areas. To address this deficiency, we propose Explore-Instruct, a novel approach to enhance the data coverage to be used in domain-specific instruction-tuning through active exploration via Large Language Models (LLMs). Built upon representative domain use cases, Explore-Instruct explores a multitude of variations or possibilities by implementing a search algorithm to obtain diversified and domain-focused instruction-tuning data. Our data-centric analysis validates the effectiveness of this proposed approach in improving domain-specific instruction coverage. Moreover, our model’s performance demonstrates considerable advancements over multiple baselines, including those utilizing domain-specific data enhancement. Our findings offer a promising opportunity to improve instruction coverage, especially in domain-specific contexts, thereby advancing the development of adaptable language models. Our code, model weights, and data are public at https://github.com/fanqiwan/Explore-Instruct.
Anthology ID:
2023.emnlp-main.587
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9435–9454
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2023.emnlp-main.587/
DOI:
10.18653/v1/2023.emnlp-main.587
Bibkey:
Cite (ACL):
Fanqi Wan, Xinting Huang, Tao Yang, Xiaojun Quan, Wei Bi, and Shuming Shi. 2023. Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9435–9454, Singapore. Association for Computational Linguistics.
Cite (Informal):
Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration (Wan et al., EMNLP 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2023.emnlp-main.587.pdf
Video:
 https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2023.emnlp-main.587.mp4