Tuan Dung Nguyen
2026
Region-Grounded Report Generation for 3D Medical Imaging: A Fine-Grained Dataset and Graph-Enhanced Framework
Cong Huy Nguyen | Son Dinh Nguyen | Guanlin Li | Tuan Dung Nguyen | Aditya Narayan Sankaran | Mai Huy Thong | Thanh Trung Nguyen | Mai Hong Son | Reza Farahbakhsh | Phi Le Nguyen | Noel Crespi
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Cong Huy Nguyen | Son Dinh Nguyen | Guanlin Li | Tuan Dung Nguyen | Aditya Narayan Sankaran | Mai Huy Thong | Thanh Trung Nguyen | Mai Hong Son | Reza Farahbakhsh | Phi Le Nguyen | Noel Crespi
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Automated medical report generation for 3D PET/CT imaging is fundamentally challenged by the high-dimensional nature of volumetric data and a critical scarcity of annotated datasets, particularly for low-resource languages. Current black-box methods map whole volumes to reports, ignoring the clinical workflow of analyzing localized Regions of Interest (RoIs) to derive diagnostic conclusions. In this paper, we bridge this gap by introducing VietPET-RoI, the first large-scale 3D PET/CT dataset with fine-grained RoI annotation for a low-resource language, comprising 600 PET/CT samples and 1,960 manually annotated RoIs, paired with corresponding clinical reports. Furthermore, to demonstrate the utility of this dataset, we propose HiRRA, a novel framework that mimics the professional radiologist diagnostic workflow by employing graph-based relational modules to capture dependencies between RoI attributes. This approach shifts from global pattern matching toward localized clinical findings. Additionally, we introduce new clinical evaluation metrics, namely RoI Coverage and RoI Quality Index, that measure both RoI localization accuracy and attribute description fidelity using LLM-based extraction. Extensive evaluation demonstrates that our framework achieves SOTA performance, surpassing existing models by 19.7% in BLEU and 4.7% in ROUGE-L, while achieving a remarkable 45.8% improvement in clinical metrics, indicating enhanced clinical reliability and reduced hallucination. Our code and dataset are available on GitHub.
2025
AstroMLab 5: Structured Summaries and Concept Extraction for 400,000 Astrophysics Papers
Yuan-Sen Ting | Alberto Accomazzi | Tirthankar Ghosal | Tuan Dung Nguyen | Rui Pan | Zechang Sun | Tijmen de Haan
Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications
Yuan-Sen Ting | Alberto Accomazzi | Tirthankar Ghosal | Tuan Dung Nguyen | Rui Pan | Zechang Sun | Tijmen de Haan
Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications
We present a dataset of 408,590 astrophysics papers from arXiv (astro-ph), spanning 1992 through July 2025. Each paper has been processed through a multi-stage pipeline to produce: (1) structured summaries organized into six semantic sections (Background, Motivation, Methodology, Results, Interpretation, Implication), and (2) concept extraction yielding 9,999 unique concepts with detailed descriptions. The dataset contains 3.8 million paper-concept associations and includes semantic embeddings for all concepts. Comparison with traditional ADS keywords reveals that the concepts provide denser coverage and more uniform distribution, while analysis of embedding space structure demonstrates that concepts are semantically dispersed within papers—enabling discovery through multiple diverse entry points. Concept vocabulary and embeddings are publicly released at https://github.com/tingyuansen/astro-ph_knowledge_graph.
MoVa: Towards Generalizable Classification of Human Morals and Values
Ziyu Chen | Junfei Sun | Chenxi Li | Tuan Dung Nguyen | Jing Yao | Xiaoyuan Yi | Xing Xie | Chenhao Tan | Lexing Xie
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Ziyu Chen | Junfei Sun | Chenxi Li | Tuan Dung Nguyen | Jing Yao | Xiaoyuan Yi | Xing Xie | Chenhao Tan | Lexing Xie
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Identifying human morals and values embedded in language is essential to empirical studies of communication. However, researchers often face substantial difficulty navigating the diversity of theoretical frameworks and data available for their analysis. Here, we contribute MoVa, a well-documented suite of resources for generalizable classification of human morals and values, consisting of (1) 16 labeled datasets and benchmarking results from four theoretically-grounded frameworks; (2) a lightweight LLM prompting strategy that outperforms fine-tuned models across multiple domains and frameworks; and (3) a new application that helps evaluate psychological surveys. In practice, we specifically recommend a classification strategy, all@once, that scores all related concepts simultaneously, resembling the well-known multi-label classifier chain. The data and methods in MoVa can facilitate many fine-grained interpretations of human and machine communication, with potential implications for the alignment of machine behavior.
2024
CARER - ClinicAl Reasoning-Enhanced Representation for Temporal Health Risk Prediction
Tuan Dung Nguyen | Thanh Trung Huynh | Minh Hieu Phan | Quoc Viet Hung Nguyen | Phi Le Nguyen
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Tuan Dung Nguyen | Thanh Trung Huynh | Minh Hieu Phan | Quoc Viet Hung Nguyen | Phi Le Nguyen
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
The increasing availability of multimodal data from electronic health records (EHR) has paved the way for deep learning methods to improve diagnosis accuracy. However, deep learning models are data-driven, requiring large-scale datasets to achieve high generalizability. Inspired by how human experts leverage reasoning for medical diagnosis, we propose CARER, a novel health risk prediction framework, that enhances deep learning models with clinical rationales derived from medically proficient Large Language Models (LLMs). In addition, we provide a cross-view alignment loss which aligns the “local” view from the patient’s health status with the “global” view from the external LLM’s clinical reasoning to boost the mutual feature learning. Through extensive experiments on two predictive tasks using two popular EHR datasets, our CARER’s significantly exceeds the performance of state-of-the-art models by up to 11.2%, especially in improving data efficiency and generalizability. Our code is available at https://github.com/tuandung2812/CARER-EMNLP-2024
2023
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Tuan Dung Nguyen | Yuan-Sen Ting | Ioana Ciuca | Charles O’Neill | Ze-Chang Sun | Maja Jabłońska | Sandor Kruk | Ernest Perkowski | Jack Miller | Jason Jason Jingsh Li | Josh Peek | Kartheik Iyer | Tomasz Rozanski | Pranav Khetarpal | Sharaf Zaman | David Brodrick | Sergio J. Rodriguez Mendez | Thang Bui | Alyssa Goodman | Alberto Accomazzi | Jill Naiman | Jesse Cranney | Kevin Schawinski | Roberta Raileanu
Proceedings of the Second Workshop on Information Extraction from Scientific Publications
Tuan Dung Nguyen | Yuan-Sen Ting | Ioana Ciuca | Charles O’Neill | Ze-Chang Sun | Maja Jabłońska | Sandor Kruk | Ernest Perkowski | Jack Miller | Jason Jason Jingsh Li | Josh Peek | Kartheik Iyer | Tomasz Rozanski | Pranav Khetarpal | Sharaf Zaman | David Brodrick | Sergio J. Rodriguez Mendez | Thang Bui | Alyssa Goodman | Alberto Accomazzi | Jill Naiman | Jesse Cranney | Kevin Schawinski | Roberta Raileanu
Proceedings of the Second Workshop on Information Extraction from Scientific Publications
Search
Fix author
Co-authors
- Alberto Accomazzi 2
- Phi Le Nguyen 2
- Yuan-Sen Ting 2
- David Brodrick 1
- Thang Bui 1
- Ziyu Chen 1
- Ioana Ciuca 1
- Jesse Cranney 1
- Noel Crespi 1
- Reza Farahbakhsh 1
- Tirthankar Ghosal 1
- Alyssa Goodman 1
- Thanh Trung Huynh 1
- Kartheik Iyer 1
- Maja Jabłońska 1
- Pranav Khetarpal 1
- Sandor Kruk 1
- Chenxi Li 1
- Guanlin Li 1
- Jason Jason Jingsh Li 1
- Jack Miller 1
- Jill Naiman 1
- Cong Huy Nguyen 1
- Quoc Viet Hung Nguyen 1
- Son Dinh Nguyen 1
- Thanh Trung Nguyen 1
- Charles O’Neill 1
- Rui Pan 1
- Josh Peek 1
- Ernest Perkowski 1
- Minh Hieu Phan 1
- Roberta Raileanu 1
- Sergio José Rodríguez Méndez 1
- Tomasz Rozanski 1
- Aditya Narayan Sankaran 1
- Kevin Schawinski 1
- Mai Hong Son 1
- Junfei Sun 1
- Ze-Chang Sun 1
- Zechang Sun 1
- Chenhao Tan 1
- Mai Huy Thong 1
- Lexing Xie 1
- Xing Xie 1
- Jing Yao 1
- Xiaoyuan Yi 1
- Sharaf Zaman 1
- Tijmen de Haan 1