Minh Thuan Nguyen
2023
ViGPTQA - State-of-the-Art LLMs for Vietnamese Question Answering: System Overview, Core Models Training, and Evaluations
Minh Thuan Nguyen
|
Khanh Tung Tran
|
Nhu Van Nguyen
|
Xuan-Son Vu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Large language models (LLMs) and their applications in low-resource languages (such as in Vietnamese) are limited due to lack of training data and benchmarking datasets. This paper introduces a practical real-world implementation of a question answering system for Vietnamese, called ViGPTQA, leveraging the power of LLM. Since there is no effective LLM in Vietnamese to date, we also propose, evaluate, and open-source an instruction-tuned LLM for Vietnamese, named ViGPT. ViGPT demonstrates exceptional performances, especially on real-world scenarios. We curate a new set of benchmark datasets that encompass both AI and human-generated data, providing a comprehensive evaluation framework for Vietnamese LLMs. By achieving state-of-the-art results and approaching other multilingual LLMs, our instruction-tuned LLM underscores the need for dedicated Vietnamese-specific LLMs. Our open-source model supports customized and privacy-fulfilled Vietnamese language processing systems.
2020
Iterative Multilingual Neural Machine Translation for Less-Common and Zero-Resource Language Pairs
Minh Thuan Nguyen
|
Phuong Thai Nguyen
|
Van Vinh Nguyen
|
Minh Cong Nguyen Hoang
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation
Search
Co-authors
- Khanh Tung Tran 1
- Nhu Van Nguyen 1
- Xuan-Son Vu 1
- Phuong-Thai Nguyen 1
- Van Vinh Nguyen 1
- show all...