Tran Chi Nguyen


2025

pdf bib
DRAGON: Dual-Encoder Retrieval with Guided Ontology Reasoning for Medical Normalization
Dao Sy Duy Minh | Nguyen Lam Phu Quy | Pham Phu Hoa | Tran Chi Nguyen | Huynh Trung Kiet | Truong Bao Tran
Proceedings of the 23rd Annual Workshop of the Australasian Language Technology Association

Adverse Drug Event (ADE) normalization to standardized medical terminologies such as MedDRA presents significant challenges due to lexical and semantic gaps between colloquial user-generated content and formal medical vocabularies. This paper presents our submission to the ALTA 2025 Shared Task on ADE normalization, evaluated using Accuracy@k metrics. Our approach employs distinct methodologies for the development and test phase. In the development phase, we propose a three-stage neural architecture: (1) bi-encoder training to establish semantic representations, (2) lexical-aware fine-tuning to capture morphological patterns alongside semantic similarity, and (3) crossencoder re-ranking for fine-grained discrimination, enabling the model to leverage both distributional semantics and lexical cues through explicit interaction modeling. For the test phase, we utilize the trained bi-encoder from stage (1) for efficient candidate retrieval, then adopt an alternative re-ranking pipeline leveraging large language models with tool-augmented retrieval and multi-stage reasoning. Specifically, a capable model performs reasoning-guided candidate selection over the retrieved top-k results, a lightweight model provides iterative feedback based on reasoning traces, and an automated verification module ensures output correctness with self-correction mechanisms. Our system achieves competitive performance on both development and test benchmarks, demonstrating the efficacy of neural retrieval-reranking architectures and the versatility of LLM-augmented neural pipelines for medical entity normalization tasks.

pdf bib
Challenge Track: JHARNA-MT: A Copy-Augmented Hybrid of LoRA-Tuned NLLB and Lexical SMT with Minimum Bayes Risk Decoding for Low-Resource Indic Languages
Dao Sy Duy Minh | Trung Kiet Huynh | Tran Chi Nguyen | Phu Quy Nguyen Lam | Phu-Hoa Pham | Nguyễn Đình Hà Dương | Dien Dinh | Long HB Nguyen
Proceedings of the 1st Workshop on Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo 2025)

This paper describes JHARNA-MT, our system for the MMLoSo 2025 Shared Task on translation between high-resource languages (Hindi, English) and four low-resource Indic tribal languages: Bhili, Gondi, Mundari, and Santali. The task poses significant challenges, including data sparsity, morphological richness, and structural divergence across language pairs. To address these, we propose a hybrid translation pipeline that integrates non-parametric retrieval, lexical statistical machine translation (SMT), and LoRA-tuned NLLB-200 neural machine translation under a unified Minimum Bayes Risk (MBR) decoding framework. Exact and fuzzy retrieval exploit redundancy in government and administrative texts, SMT with diagonal alignment priors and back-translation provides lexically faithful hypotheses, and the NLLB-LoRA component contributes fluent neural candidates. MBR decoding selects consensus translations using a metric-matched utility based on a weighted combination of BLEU and chrF, mitigating the complementary error modes of SMT and NMT. Our final system, further enhanced with script-aware digit normalization and entity-preserving post-processing, achieves a private leaderboard score of 186.37 and ranks 2nd overall in the shared task, with ablation studies confirming the contribution of each component.

pdf bib
Systematic Evaluation of Machine Learning and Transformer-Based Methods for Scientific Telescope Literature Classification
Huynh Trung Kiet | Dao Sy Duy Minh | Tran Chi Nguyen | Nguyen Lam Phu Quy | Pham Phu Hoa | Nguyen Dinh Ha Duong | Dinh Dien | Nguyen Hong Buu Long
Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications

Recent space missions such as Hubble, Chandra, and JWST have produced a rapidly growing body of scientific literature. Maintaining telescope bibliographies is essential for mission assessment and research traceability, yet current curation processes rely heavily on manual annotation and do not scale. To facilitate progress in this direction, the TRACS @ WASP 2025 shared task provides a benchmark for automatic telescope bibliographic classification based on scientific publications. In this work, we conduct a comparative study of modeling strategies for this task. We first explore traditional machine learning methods such as multinomial Naive Bayes with TF–IDF and CountVectorizer representations. We then evaluate transformer-based multi-label classification using BERT-based scientific language models. Finally, we investigate a task-wise classification approach, where we decompose the problem into separate prediction tasks and train a dedicated model for each. In addition, we experiment with a limited-resource LLM-based approach, showing that even without full fine-tuning and using only a partial subset of the training data, LLMs exhibit promising potential for telescope classification. Our best system achieves a macro F1 of 0.72 with BERT-based models on the test evaluation, substantially outperforming the official openai-gpt-oss-20b baseline (0.31 macro F1).