2025
pdf
bib
abs
Enhancing Named Entity Translation from Classical Chinese to Vietnamese in Traditional Vietnamese Medicine Domain: A Hybrid Masking and Dictionary-Augmented Approach
Nhu Vo Quynh Pham
|
Uyen Bao Nguyen Phuc
|
Long Hong Buu Nguyen
|
Dien Dinh
Proceedings of the 18th International Natural Language Generation Conference
Vietnam’s traditional medical texts were historically written in Classical Chinese using Sino-Vietnamese pronunciations. As the Vietnamese language transitioned to a Latin-based national script and interest in integrating traditional medicine with modern healthcare grows, accurate translation of these texts has become increasingly important. However, the diversity of terminology and the complexity of translating medical entities into modern contexts pose significant challenges. To address this, we propose a method that fine-tunes large language models (LLMs) using augmented data and a Hybrid Entity Masking and Replacement (HEMR) strategy to improve named entity translation. We also introduce a parallel named entity translation dataset specifically curated for traditional Vietnamese medicine. Our evaluation across multiple LLMs shows that the proposed approach achieves a translation accuracy of 71.91%, demonstrating its effectiveness. These results underscore the importance of incorporating named entity awareness into translation systems, particularly in low-resource and domain-specific settings like traditional Vietnamese medicine.
2024
pdf
bib
abs
ViMedAQA: A Vietnamese Medical Abstractive Question-Answering Dataset and Findings of Large Language Model
Minh-Nam Tran
|
Phu-Vinh Nguyen
|
Long Nguyen
|
Dien Dinh
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Question answering involves creating answers to questions. With the growth of large language models, the ability of question-answering systems has dramatically improved. However, there is a lack of Vietnamese abstractive question-answering datasets, especially in the medical domain. Therefore, this research aims to mitigate this gap by introducing ViMedAQA. This **Vi**etnamese **Med**ical **A**bstractive **Q**uestion-**A**nswering dataset covers four topics in the Vietnamese medical domain, including body parts, disease, drugs and medicine. Additionally, the empirical results on the proposed dataset examine the capability of the large language models in the Vietnamese medical domain, including reasoning, memorizing and awareness of essential information.
pdf
bib
abs
ViGLUE: A Vietnamese General Language Understanding Benchmark and Analysis of Vietnamese Language Models
Minh-Nam Tran
|
Phu-Vinh Nguyen
|
Long Nguyen
|
Dien Dinh
Findings of the Association for Computational Linguistics: NAACL 2024
As the number of language models has increased, various benchmarks have been suggested to assess the proficiency of the models in natural language understanding. However, there is a lack of such a benchmark in Vietnamese due to the difficulty in accessing natural language processing datasets or the scarcity of task-specific datasets. **ViGLUE**, the proposed dataset collection, is a **Vi**etnamese **G**eneral **L**anguage **U**nderstanding **E**valuation benchmark developed using three methods: translating an existing benchmark, generating new corpora, and collecting available datasets. ViGLUE contains twelve tasks and encompasses over ten areas and subjects, enabling it to evaluate models comprehensively over a broad spectrum of aspects. Baseline models utilizing multilingual language models are also provided for all tasks in the proposed benchmarks. In addition, the study of the available Vietnamese large language models is conducted to explore the language models’ ability in the few-shot learning framework, leading to the exploration of the relationship between specific tasks and the number of shots.
pdf
bib
Advancing Vietnamese Information Retrieval with Learning Objective and Benchmark
Vinh Nguyen
|
Nam Tran
|
Long Nguyen
|
Dien Dinh
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation
pdf
bib
ViHerbQA: A Robust QA Model for Vietnamese Traditional Herbal Medicine
Quyen Truong
|
Long Nguyen
|
Dien Dinh
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation
pdf
bib
Multi-mask Prefix Tuning: Applying Multiple Adaptive Masks on Deep Prompt Tuning
Qui Tu
|
Trung Nguyen
|
Long Nguyen
|
Dien Dinh
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation
pdf
bib
VHE: A New Dataset for Event Extraction from Vietnamese Historical Texts
Truc Hoang
|
Long Nguyen
|
Dien Dinh
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation
pdf
bib
A Comparative Study of Chart Summarization
An Chu
|
Thong Huynh
|
Long Nguyen
|
Dien Dinh
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation
2022
pdf
bib
abs
Multi-level Community-awareness Graph Neural Networks for Neural Machine Translation
Binh Nguyen
|
Long Nguyen
|
Dien Dinh
Proceedings of the 29th International Conference on Computational Linguistics
Neural Machine Translation (NMT) aims to translate the source- to the target-language while preserving the original meaning. Linguistic information such as morphology, syntactic, and semantics shall be grasped in token embeddings to produce a high-quality translation. Recent works have leveraged the powerful Graph Neural Networks (GNNs) to encode such language knowledge into token embeddings. Specifically, they use a trained parser to construct semantic graphs given sentences and then apply GNNs. However, most semantic graphs are tree-shaped and too sparse for GNNs which cause the over-smoothing problem. To alleviate this problem, we propose a novel Multi-level Community-awareness Graph Neural Network (MC-GNN) layer to jointly model local and global relationships between words and their linguistic roles in multiple communities. Intuitively, the MC-GNN layer substitutes a self-attention layer at the encoder side of a transformer-based machine translation model. Extensive experiments on four language-pair datasets with common evaluation metrics show the remarkable improvements of our method while reducing the time complexity in very long sentences.
pdf
bib
Integrating Label Attention into CRF-based Vietnamese Constituency Parser
Duy Vu-Tran
|
Phu-Thinh Pham
|
Duc Do
|
An-Vinh Luong
|
Dien Dinh
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation
pdf
bib
Intent Detection and Slot Filling from Dependency Parsing Perspective: A Case Study in Vietnamese
Phu-Thinh Pham
|
Duy Vu-Tran
|
Duc Do
|
An-Vinh Luong
|
Dien Dinh
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation
2020
pdf
bib
Identifying Authors Based on Stylometric measures of Vietnamese texts
Ho Ngoc Lam
|
Vo Diep Nhu
|
Dinh Dien
|
Nguyen Tuyet Nhung
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation
2014
pdf
bib
A Novel Approach for Handling Unknown Word Problem in Chinese-Vietnamese Machine Translation
Phuoc Tran
|
Dien Dinh
International Journal of Computational Linguistics & Chinese Language Processing, Volume 19, Number 1, March 2014
pdf
bib
Building English-Vietnamese Named Entity Corpus with Aligned Bilingual News Articles
Quoc Hung Ngo
|
Dinh Dien
|
Werner Winiwarter
Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing
2010
pdf
bib
An ontology-driven system for detecting global health events
Nigel Collier
|
Reiko Matsuda Goodwin
|
John McCrae
|
Son Doan
|
Ai Kawazoe
|
Mike Conway
|
Asanee Kawtrakul
|
Koichi Takeuchi
|
Dinh Dien
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)
2003
pdf
bib
abs
A hybrid approach to word order transfer in the English-to-Vietnamese machine translation
Dinh Dien
|
Nguyen Luu Thuy Ngan
|
Do Xuan Quang
|
Van Chi Nam
Proceedings of Machine Translation Summit IX: Papers
Word Order transfer is a compulsory stage and has a great effect on the translation result of a transfer-based machine translation system. To solve this problem, we can use fixed rules (rule-based) or stochastic methods (corpus-based) which extract word order transfer rules between two languages. However, each approach has its own advantages and disadvantages. In this paper, we present a hybrid approach based on fixed rules and Transformation-Based Learning (or TBL) method. Our purpose is to transfer automatically the English word orders into the Vietnamese ones. The learning process will be trained on the annotated bilingual corpus (named EVC: English-Vietnamese Corpus) that has been automatically word-aligned, phrase-aligned and POS-tagged. This transfer result is being used for the transfer module in the English-Vietnamese transfer-based machine translation system.
pdf
bib
abs
BTL: a hybrid model for English-Vietnamese machine translation
Dinh Dien
|
Kiem Hoang
|
Eduard Hovy
Proceedings of Machine Translation Summit IX: Papers
Machine Translation (MT) is the most interesting and difficult task which has been posed since the beginning of computer history. The highest difficulty which computers had to face with, is the built-in ambiguity of Natural Languages. Formerly, a lot of human-devised rules have been used to disambiguate those ambiguities. Building such a complete rule-set is time-consuming and labor-intensive task whilst it doesn’t cover all the cases. Besides, when the scale of system increases, it is very difficult to control that rule-set. In this paper, we present a new model of learning-based MT (entitled BTL: Bitext-Transfer Learning) that learns from bilingual corpus to extract disambiguating rules. This model has been experimented in English-to-Vietnamese MT system (EVT) and it gave encouraging results.
pdf
bib
POS-Tagger for English-Vietnamese Bilingual Corpus
Dinh Dien
|
Hoang Kiem
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond
2002
pdf
bib
Building a Training Corpus for Word Sense Disambiguation in English-to-Vietnamese Machine Translation
Dien Dinh
COLING-02: Machine Translation in Asia
2001
pdf
bib
An Approach to Parsing Vietnamese Noun Compounds
Dinh Dien
|
Hoang Kiem
Proceedings of the Seventh International Workshop on Parsing Technologies