Shumpei Inoue


2023

pdf
Towards Safer Operations: An Expert-involved Dataset of High-Pressure Gas Incidents for Preventing Future Failures
Shumpei Inoue | Minh-Tien Nguyen | Hiroki Mizokuchi | Tuan-Anh Nguyen | Huu-Hiep Nguyen | Dung Le
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

This paper introduces a new IncidentAI dataset for safety prevention. Different from prior corpora that usually contain a single task, our dataset comprises three tasks: named entity recognition, cause-effect extraction, and information retrieval. The dataset is annotated by domain experts who have at least six years of practical experience as high-pressure gas conservation managers. We validate the contribution of the dataset in the scenario of safety prevention. Preliminary results on the three tasks show that NLP techniques are beneficial for analyzing incident reports to prevent future failures. The dataset facilitates future research in NLP and incident management communities. The access to the dataset is also provided (The IncidentAI dataset is available at: https://github.com/Cinnamon/incident-ai-dataset).

2022

pdf
Enhance Incomplete Utterance Restoration by Joint Learning Token Extraction and Text Generation
Shumpei Inoue | Tsungwei Liu | Son Nguyen | Minh-Tien Nguyen
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

This paper introduces a model for incomplete utterance restoration (IUR) called JET (Joint learning token Extraction and Text generation). Different from prior studies that only work on extraction or abstraction datasets, we design a simple but effective model, working for both scenarios of IUR. Our design simulates the nature of IUR, where omitted tokens from the context contribute to restoration. From this, we construct a Picker that identifies the omitted tokens. To support the picker, we design two label creation methods (soft and hard labels), which can work in cases of no annotation data for the omitted tokens. The restoration is done by using a Generator with the help of the Picker on joint learning. Promising results on four benchmark datasets in extraction and abstraction scenarios show that our model is better than the pretrained T5 and non-generative language model methods in both rich and limited training data settings.

pdf
Meeting Decision Tracker: Making Meeting Minutes with De-Contextualized Utterances
Shumpei Inoue | Hy Nguyen | Hoang Pham | Tsungwei Liu | Minh-Tien Nguyen
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: System Demonstrations

Meetings are a universal process to make decisions in business and project collaboration. The capability to automatically itemize the decisions in daily meetings allows for extensive tracking of past discussions. To that end, we developed Meeting Decision Tracker, a prototype system to construct decision items comprising decision utterance detector (DUD) and decision utterance rewriter (DUR). We show that DUR makes a sizable contribution to improving the user experience by dealing with utterance collapse in natural conversation. An introduction video of our system is also available at https://youtu.be/TG1pJJo0Iqo.