Huu-Hiep Nguyen

Also published as: Huu Hiep Nguyen


2023

pdf bib
Towards Safer Operations: An Expert-involved Dataset of High-Pressure Gas Incidents for Preventing Future Failures
Shumpei Inoue | Minh-Tien Nguyen | Hiroki Mizokuchi | Tuan-Anh Nguyen | Huu-Hiep Nguyen | Dung Le
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

This paper introduces a new IncidentAI dataset for safety prevention. Different from prior corpora that usually contain a single task, our dataset comprises three tasks: named entity recognition, cause-effect extraction, and information retrieval. The dataset is annotated by domain experts who have at least six years of practical experience as high-pressure gas conservation managers. We validate the contribution of the dataset in the scenario of safety prevention. Preliminary results on the three tasks show that NLP techniques are beneficial for analyzing incident reports to prevent future failures. The dataset facilitates future research in NLP and incident management communities. The access to the dataset is also provided (The IncidentAI dataset is available at: https://github.com/Cinnamon/incident-ai-dataset).

pdf bib
VTCC-NLP at SemEval-2023 Task 6:Long-Text Representation Based on Graph Neural Network for Rhetorical Roles Prediction
Huu Hiep Nguyen | Hoang Ngo | Khac-Hoai Nam Bui
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

Rhetorical Roles (RR) prediction is to predict the label of each sentence in legal documents, which is regarded as an emergent task for legal document understanding. In this study, we present a novel method for the RR task by exploiting the long context representation. Specifically, legal documents are known as long texts, in which previous works have no ability to consider the inherent dependencies among sentences. In this paper, we propose GNNRR (Graph Neural Network for Rhetorical Roles Prediction), which is able to model the cross-information for long texts. Furthermore, we develop multitask learning by incorporating label shift prediction (LSP) for segmenting a legal document. The proposed model is evaluated on the SemEval 2023 Task 6 - Legal Eval Understanding Legal Texts for RR sub-task. Accordingly, our method achieves the top 4 in the public leaderboard of the sub-task. Our source code is available for further investigation https://github.com/hiepnh137/SemEval2023-Task6-Rhetorical-Roles.