Tuan Dung Nguyen


2025

pdf bib
MoVa: Towards Generalizable Classification of Human Morals and Values
Ziyu Chen | Junfei Sun | Chenxi Li | Tuan Dung Nguyen | Jing Yao | Xiaoyuan Yi | Xing Xie | Chenhao Tan | Lexing Xie
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Identifying human morals and values embedded in language is essential to empirical studies of communication. However, researchers often face substantial difficulty navigating the diversity of theoretical frameworks and data available for their analysis. Here, we contribute MoVa, a well-documented suite of resources for generalizable classification of human morals and values, consisting of (1) 16 labeled datasets and benchmarking results from four theoretically-grounded frameworks; (2) a lightweight LLM prompting strategy that outperforms fine-tuned models across multiple domains and frameworks; and (3) a new application that helps evaluate psychological surveys. In practice, we specifically recommend a classification strategy, all@once, that scores all related concepts simultaneously, resembling the well-known multi-label classifier chain. The data and methods in MoVa can facilitate many fine-grained interpretations of human and machine communication, with potential implications for the alignment of machine behavior.

2024

pdf bib
CARER - ClinicAl Reasoning-Enhanced Representation for Temporal Health Risk Prediction
Tuan Dung Nguyen | Thanh Trung Huynh | Minh Hieu Phan | Quoc Viet Hung Nguyen | Phi Le Nguyen
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

The increasing availability of multimodal data from electronic health records (EHR) has paved the way for deep learning methods to improve diagnosis accuracy. However, deep learning models are data-driven, requiring large-scale datasets to achieve high generalizability. Inspired by how human experts leverage reasoning for medical diagnosis, we propose CARER, a novel health risk prediction framework, that enhances deep learning models with clinical rationales derived from medically proficient Large Language Models (LLMs). In addition, we provide a cross-view alignment loss which aligns the “local” view from the patient’s health status with the “global” view from the external LLM’s clinical reasoning to boost the mutual feature learning. Through extensive experiments on two predictive tasks using two popular EHR datasets, our CARER’s significantly exceeds the performance of state-of-the-art models by up to 11.2%, especially in improving data efficiency and generalizability. Our code is available at https://github.com/tuandung2812/CARER-EMNLP-2024

2023

pdf bib
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Tuan Dung Nguyen | Yuan-Sen Ting | Ioana Ciuca | Charles O’Neill | Ze-Chang Sun | Maja Jabłońska | Sandor Kruk | Ernest Perkowski | Jack Miller | Jason Jason Jingsh Li | Josh Peek | Kartheik Iyer | Tomasz Rozanski | Pranav Khetarpal | Sharaf Zaman | David Brodrick | Sergio J. Rodriguez Mendez | Thang Bui | Alyssa Goodman | Alberto Accomazzi | Jill Naiman | Jesse Cranney | Kevin Schawinski | Roberta Raileanu
Proceedings of the Second Workshop on Information Extraction from Scientific Publications