ViGLUE: A Vietnamese General Language Understanding Benchmark and Analysis of Vietnamese Language Models

Minh-Nam Tran; Phu-Vinh Nguyen; Long Nguyen; Dinh Dien

ViGLUE: A Vietnamese General Language Understanding Benchmark and Analysis of Vietnamese Language Models

Minh-Nam Tran, Phu-Vinh Nguyen, Long Nguyen, Dien Dinh

Abstract

As the number of language models has increased, various benchmarks have been suggested to assess the proficiency of the models in natural language understanding. However, there is a lack of such a benchmark in Vietnamese due to the difficulty in accessing natural language processing datasets or the scarcity of task-specific datasets. **ViGLUE**, the proposed dataset collection, is a **Vi**etnamese **G**eneral **L**anguage **U**nderstanding **E**valuation benchmark developed using three methods: translating an existing benchmark, generating new corpora, and collecting available datasets. ViGLUE contains twelve tasks and encompasses over ten areas and subjects, enabling it to evaluate models comprehensively over a broad spectrum of aspects. Baseline models utilizing multilingual language models are also provided for all tasks in the proposed benchmarks. In addition, the study of the available Vietnamese large language models is conducted to explore the language models’ ability in the few-shot learning framework, leading to the exploration of the relationship between specific tasks and the number of shots.

Anthology ID:: 2024.findings-naacl.261
Volume:: Findings of the Association for Computational Linguistics: NAACL 2024
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Kevin Duh, Helena Gomez, Steven Bethard
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4174–4189
Language:
URL:: https://aclanthology.org/2024.findings-naacl.261
DOI:
Bibkey:
Cite (ACL):: Minh-Nam Tran, Phu-Vinh Nguyen, Long Nguyen, and Dien Dinh. 2024. ViGLUE: A Vietnamese General Language Understanding Benchmark and Analysis of Vietnamese Language Models. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 4174–4189, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: ViGLUE: A Vietnamese General Language Understanding Benchmark and Analysis of Vietnamese Language Models (Tran et al., Findings 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/naacl24-info/2024.findings-naacl.261.pdf