Abstract
Existing large language models struggle to support numerous low-resource languages, particularly the extremely low-resource ones, for which there is minimal training data available for effective parameter updating. We thus investigate whether LLMs can learn a new language on the fly solely through prompting. To study this question, we collect a research suite for Zhuang, a language supported by no LLMs currently. We introduce DiPMT++, a framework for adapting LLMs to unseen languages by in-context learning. Using a dictionary and 5K parallel sentences only, DiPMT++ significantly enhances the performance of GPT-4 from 0 to 16 BLEU for Chinese-to-Zhuang translation and achieves 32 BLEU for Zhuang-to-Chinese translation. We also validate the effectiveness of our framework on Kalamang, another unseen language. Furthermore, we demonstrate the practical utility of DiPMT++ in aiding humans in translating completely unseen languages, which could contribute to the preservation of linguistic diversity.- Anthology ID:
- 2024.findings-acl.519
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2024
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Lun-Wei Ku, Andre Martins, Vivek Srikumar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 8783–8800
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/2024.findings-acl.519/
- DOI:
- 10.18653/v1/2024.findings-acl.519
- Cite (ACL):
- Chen Zhang, Xiao Liu, Jiuheng Lin, and Yansong Feng. 2024. Teaching Large Language Models an Unseen Language on the Fly. In Findings of the Association for Computational Linguistics: ACL 2024, pages 8783–8800, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- Teaching Large Language Models an Unseen Language on the Fly (Zhang et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/2024.findings-acl.519.pdf