CRAFT: Extracting and Tuning Cultural Instructions from the Wild

Bin Wang, Geyu Lin, Zhengyuan Liu, Chengwei Wei, Nancy Chen


Abstract
Large language models (LLMs) have rapidly evolved as the foundation of various natural language processing (NLP) applications. Despite their wide use cases, their understanding of culturally-related concepts and reasoning remains limited. Meantime, there is a significant need to enhance these models’ cultural reasoning capabilities, especially concerning underrepresented regions. This paper introduces a novel pipeline for extracting high-quality, culturally-related instruction tuning datasets from vast unstructured corpora. We utilize a self-instruction generation pipeline to identify cultural concepts and trigger instruction. By integrating with a general-purpose instruction tuning dataset, our model demonstrates enhanced capabilities in recognizing and understanding regional cultural nuances, thereby enhancing its reasoning capabilities. We conduct experiments across three regions: Singapore, the Philippines, and the United States, achieving performance improvement of up to 6%. Our research opens new avenues for extracting cultural instruction tuning sets directly from unstructured data, setting a precedent for future innovations in the field.
Anthology ID:
2024.c3nlp-1.4
Volume:
Proceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Vinodkumar Prabhakaran, Sunipa Dev, Luciana Benotti, Daniel Hershcovich, Laura Cabello, Yong Cao, Ife Adebara, Li Zhou
Venues:
C3NLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
42–47
Language:
URL:
https://aclanthology.org/2024.c3nlp-1.4
DOI:
Bibkey:
Cite (ACL):
Bin Wang, Geyu Lin, Zhengyuan Liu, Chengwei Wei, and Nancy Chen. 2024. CRAFT: Extracting and Tuning Cultural Instructions from the Wild. In Proceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP, pages 42–47, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
CRAFT: Extracting and Tuning Cultural Instructions from the Wild (Wang et al., C3NLP-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.c3nlp-1.4.pdf