Phuong Nguyen

2025

pdf bib abs
JNLP at SemEval-2025 Task 11: Cross-Lingual Multi-Label Emotion Detection Using Generative Models
Jieying Xue | Phuong Nguyen | Minh Nguyen | Xin Liu
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

With the rapid advancement of global digitalization, users from different countries increasingly rely on social media for information exchange. In this context, multilingual multi-label emotion detection has emerged as a critical research area.This study addresses SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection. Our paper focuses on two sub-tracks of this task: (1) Track A: Multi-label emotion detection, and (2) Track B: Emotion intensity.To tackle multilingual challenges, we leverage pre-trained multilingual models and focus on two architectures: (1) a fine-tuned BERT-based classification model and (2) an instruction-tuned generative LLM. Additionally, we propose two methods for handling multi-label classification: the Base method, which maps an input directly to all its corresponding emotion labels, and the Pairwise method, which models the relationship between the input text and each emotion category individually.Experimental results demonstrate the strong generalization ability of our approach in multilingual emotion recognition. In Track A, our method achieved Top 4 performance across 10 languages, ranking 1st in Hindi language. In Track B, our approach also secured Top 5 performance in 7 languages, highlighting its simplicity and effectiveness.

pdf bib abs
JNLP at SemEval-2025 Task 1: Multimodal Idiomaticity Representation with Large Language Models
Blake Matheny | Phuong Nguyen | Minh Nguyen
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Idioms and figurative language are nuanced linguistic phenomena that transport semanticity and culture via non-compositional multi-word expressions. This type of figurative language remains difficult for small and large language models to handle. Various attempts have been made to identify idiomaticity in text. The approach presented in this paper represents an intuitive attempt to accurately address Task 1: AdMIRe Subtask A to correctly order a series of images and captions by concatenating the image captions as a sequence. The methods employ the reliability of a pre-trained vision and language model for the image-type task and a large language model with instruction fine-tuning for a causal language model approach to handle the caption portion of the task. The results are informative for future iterations, but not comparably substantial.

2023

pdf bib abs
StructSP: Efficient Fine-tuning of Task-Oriented Dialog System by Using Structure-aware Boosting and Grammar Constraints
Truong Do | Phuong Nguyen | Minh Nguyen
Findings of the Association for Computational Linguistics: ACL 2023

We have investigated methods utilizing hierarchical structure information representation in the semantic parsing task and have devised a method that reinforces the semantic awareness of a pre-trained language model via a two-step fine-tuning mechanism: hierarchical structure information strengthening and a final specific task. The model used is better than existing ones at learning the contextual representations of utterances embedded within its hierarchical semantic structure and thereby improves system performance. In addition, we created a mechanism using inductive grammar to dynamically prune the unpromising directions in the semantic structure parsing process. Finally, through experimentsOur code will be published when this paper is accepted. on the TOP and TOPv2 (low-resource setting) datasets, we achieved state-of-the-art (SOTA) performance, confirming the effectiveness of our proposed model.

2022

pdf bib abs
Complex Word Identification in Vietnamese: Towards Vietnamese Text Simplification
Phuong Nguyen | David Kauchak
Proceedings of the Workshop on Multilingual Information Access (MIA)

Text Simplification has been an extensively researched problem in English, but has not been investigated in Vietnamese. We focus on the Vietnamese-specific Complex Word Identification task, often the first step in Lexical Simplification (Shardlow, 2013). We examine three different Vietnamese datasets constructed for other Natural Language Processing tasks and show that, like in other languages, frequency is a strong signal in determining whether a word is complex, with a mean accuracy of 86.87%. Across the datasets, we find that the 10% most frequent words in many corpus can be labelled as simple, and the rest as complex, though this is more variable for smaller corpora. We also examine how human annotators perform at this task. Given the subjective nature, there is a fair amount of variability in which words are seen as difficult, though majority results are more consistent.

2021

pdf bib abs
CovRelex: A COVID-19 Retrieval System with Relation Extraction
Vu Tran | Van-Hien Tran | Phuong Nguyen | Chau Nguyen | Ken Satoh | Yuji Matsumoto | Minh Nguyen
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

This paper presents CovRelex, a scientific paper retrieval system targeting entities and relations via relation extraction on COVID-19 scientific papers. This work aims at building a system supporting users efficiently in acquiring knowledge across a huge number of COVID-19 scientific papers published rapidly. Our system can be accessed via https://www.jaist.ac.jp/is/labs/nguyen-lab/systems/covrelex/.

Co-authors

Vu Tran 1

Venues

Fix data