ViGPTQA - State-of-the-Art LLMs for Vietnamese Question Answering: System Overview, Core Models Training, and Evaluations
Minh Thuan Nguyen, Khanh Tung Tran, Nhu Van Nguyen, Xuan-Son Vu
Abstract
Large language models (LLMs) and their applications in low-resource languages (such as in Vietnamese) are limited due to lack of training data and benchmarking datasets. This paper introduces a practical real-world implementation of a question answering system for Vietnamese, called ViGPTQA, leveraging the power of LLM. Since there is no effective LLM in Vietnamese to date, we also propose, evaluate, and open-source an instruction-tuned LLM for Vietnamese, named ViGPT. ViGPT demonstrates exceptional performances, especially on real-world scenarios. We curate a new set of benchmark datasets that encompass both AI and human-generated data, providing a comprehensive evaluation framework for Vietnamese LLMs. By achieving state-of-the-art results and approaching other multilingual LLMs, our instruction-tuned LLM underscores the need for dedicated Vietnamese-specific LLMs. Our open-source model supports customized and privacy-fulfilled Vietnamese language processing systems.- Anthology ID:
- 2023.emnlp-industry.70
- Volume:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Mingxuan Wang, Imed Zitouni
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 754–764
- Language:
- URL:
- https://aclanthology.org/2023.emnlp-industry.70
- DOI:
- 10.18653/v1/2023.emnlp-industry.70
- Cite (ACL):
- Minh Thuan Nguyen, Khanh Tung Tran, Nhu Van Nguyen, and Xuan-Son Vu. 2023. ViGPTQA - State-of-the-Art LLMs for Vietnamese Question Answering: System Overview, Core Models Training, and Evaluations. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 754–764, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- ViGPTQA - State-of-the-Art LLMs for Vietnamese Question Answering: System Overview, Core Models Training, and Evaluations (Nguyen et al., EMNLP 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2023.emnlp-industry.70.pdf