Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models
Sang Truong, Duc Nguyen, Toan Nguyen, Dong Le, Nhi Truong, Tho Quan, Sanmi Koyejo
Abstract
Recent advancements in large language models (LLMs) have underscored their importance in the evolution of artificial intelligence. However, despite extensive pretraining on multilingual datasets, available open-sourced LLMs exhibit limited effectiveness in processing Vietnamese. The challenge is exacerbated by the absence of systematic benchmark datasets and metrics tailored for Vietnamese LLM evaluation. To mitigate these issues, we have finetuned LLMs specifically for Vietnamese and developed a comprehensive evaluation framework encompassing 10 tasks and 31 metrics. We observe that finetuning can help LLMs transfer knowledge across languages, serving as an efficient way to bolster their capabilities in non-English languages. Moreover, our analysis indicates that larger models can introduce more biases and uncalibrated outputs and the key factor influencing LLM performance is the quality of the training or finetuning datasets. These insights underscore the significance of meticulous finetuning with high-quality datasets in enhancing LLM performance.- Anthology ID:
- 2024.findings-naacl.182
- Volume:
- Findings of the Association for Computational Linguistics: NAACL 2024
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Kevin Duh, Helena Gomez, Steven Bethard
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2849–2900
- Language:
- URL:
- https://aclanthology.org/2024.findings-naacl.182
- DOI:
- Cite (ACL):
- Sang Truong, Duc Nguyen, Toan Nguyen, Dong Le, Nhi Truong, Tho Quan, and Sanmi Koyejo. 2024. Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 2849–2900, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models (Truong et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.findings-naacl.182.pdf