2024
pdf
bib
abs
A Novel Instruction Tuning Method for Vietnamese Mathematical Reasoning using Trainable Open-Source Large Language Models
Quang-Vinh Nguyen
|
Thanh-Do Nguyen
|
Van-Vinh Nguyen
|
Khac-Hoai Nam Bui
Proceedings of the 28th Conference on Computational Natural Language Learning
This study introduces Simple Reasoning with Code (SiRC), a novel instruction fine-tuning method for solving mathematical reasoning problems, particularly effective for Vietnamese, which is considered a low-resource language. Specifically, solving mathematical problems requires strategic and logical reasoning, which remains challenging in this research area. This paper presents a simple yet effective instruction fine-tuning method for mathematical reasoning. Unlike previous approaches, our proposed method effectively combines chain-of-thought reasoning with code transfer methods without requiring a sophisticated inference procedure. Furthermore, we focus on exploiting small open-source large language models (LLMs) for the Vietnamese language. In this regard, we first introduce a trainable Vietnamese math reasoning dataset, which is named ViMath-InstructCode. The proposed dataset is then used for fine-tuning open-source LLMs (e.g., less than 10 billion parameters). Experiments conducted on our custom ViMath-Bench dataset, the largest benchmarking dataset focusing on Vietnamese mathematical problems, indicate the promising results of our proposed method. Our source code and dataset are available for further exploitation.
2022
pdf
bib
abs
A Simple and Fast Strategy for Handling Rare Words in Neural Machine Translation
Nguyen-Hoang Minh-Cong
|
Vinh Thi Ngo
|
Van Vinh Nguyen
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop
Neural Machine Translation (NMT) has currently obtained state-of-the-art in machine translation systems. However, dealing with rare words is still a big challenge in translation systems. The rare words are often translated using a manual dictionary or copied from the source to the target with original words. In this paper, we propose a simple and fast strategy for integrating constraints during the training and decoding process to improve the translation of rare words. The effectiveness of our proposal is demonstrated in both high and low-resource translation tasks, including the language pairs: English → Vietnamese, Chinese → Vietnamese, Khmer → Vietnamese, and Lao → Vietnamese. We show the improvements of up to +1.8 BLEU scores over the baseline systems.
2020
pdf
bib
Iterative Multilingual Neural Machine Translation for Less-Common and Zero-Resource Language Pairs
Minh Thuan Nguyen
|
Phuong Thai Nguyen
|
Van Vinh Nguyen
|
Minh Cong Nguyen Hoang
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation