ViLegalLM: Language Models for Vietnamese Legal Text

Truong-Phuc Nguyen, Quy-Nhan Nguyen, Minh-Tien Nguyen


Abstract
We present **ViLegalLM**, comprising **ViLegalBERT** and **ViLegalQwen**, the first suite of Vietnamese pretrained language models for legal text understanding and generation. It includes one encoder-only model (ViLegalBERT, 135M parameters) and two decoder-only models (ViLegalQwen2.5-1.5B-Base and ViLegalQwen3-1.7B-Base), all continually pretrained on a newly curated 16GB Vietnamese legal corpus, significantly larger than previous work. To mitigate data scarcity, we construct three synthetic datasets using LLM-based generation and hard negative mining for True/False QA, Multiple Choice QA, and Natural Language Inference. We establish state-of-the-art results among open-source models on four main Vietnamese legal downstream tasks spanning ten benchmarks, demonstrating that continual pretraining from base models consistently outperforms instruction-tuned adaptation. Source codes, corpus, datasets, and model checkpoints are publicly available at https://github.com/ntphuc149/ViLegalLM.
Anthology ID:
2026.findings-acl.1801
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
36136–36150
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.1801/
DOI:
Bibkey:
Cite (ACL):
Truong-Phuc Nguyen, Quy-Nhan Nguyen, and Minh-Tien Nguyen. 2026. ViLegalLM: Language Models for Vietnamese Legal Text. In Findings of the Association for Computational Linguistics: ACL 2026, pages 36136–36150, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
ViLegalLM: Language Models for Vietnamese Legal Text (Nguyen et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.1801.pdf
Checklist:
 2026.findings-acl.1801.checklist.pdf