Minh-Tien Nguyen


2022

pdf
Enhance Incomplete Utterance Restoration by Joint Learning Token Extraction and Text Generation
Shumpei Inoue | Tsungwei Liu | Son Nguyen | Minh-Tien Nguyen
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

This paper introduces a model for incomplete utterance restoration (IUR) called JET (Joint learning token Extraction and Text generation). Different from prior studies that only work on extraction or abstraction datasets, we design a simple but effective model, working for both scenarios of IUR. Our design simulates the nature of IUR, where omitted tokens from the context contribute to restoration. From this, we construct a Picker that identifies the omitted tokens. To support the picker, we design two label creation methods (soft and hard labels), which can work in cases of no annotation data for the omitted tokens. The restoration is done by using a Generator with the help of the Picker on joint learning. Promising results on four benchmark datasets in extraction and abstraction scenarios show that our model is better than the pretrained T5 and non-generative language model methods in both rich and limited training data settings.

pdf
Make The Most of Prior Data: A Solution for Interactive Text Summarization with Preference Feedback
Duy-Hung Nguyen | Nguyen Viet Dung Nghiem | Bao-Sinh Nguyen | Dung Tien Tien Le | Shahab Sabahi | Minh-Tien Nguyen | Hung Le
Findings of the Association for Computational Linguistics: NAACL 2022

For summarization, human preferences is critical to tame outputs of the summarizer in favor of human interests, as ground-truth summaries are scarce and ambiguous. Practical settings require dynamic exchanges between humans and AI agents wherein feedback is provided in an online manner, a few at a time. In this paper, we introduce a new framework to train summarization models with preference feedback interactively. By properly leveraging offline data and a novel reward model, we improve the performance regarding ROUGE scores and sample-efficiency. Our experiments on three various datasets confirm the benefit of the proposed framework in active, few-shot and online settings of preference learning.

pdf
Meeting Decision Tracker: Making Meeting Minutes with De-Contextualized Utterances
Shumpei Inoue | Hy Nguyen | Hoang Pham | Tsungwei Liu | Minh-Tien Nguyen
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: System Demonstrations

Meetings are a universal process to make decisions in business and project collaboration. The capability to automatically itemize the decisions in daily meetings allows for extensive tracking of past discussions. To that end, we developed Meeting Decision Tracker, a prototype system to construct decision items comprising decision utterance detector (DUD) and decision utterance rewriter (DUR). We show that DUR makes a sizable contribution to improving the user experience by dealing with utterance collapse in natural conversation. An introduction video of our system is also available at https://youtu.be/TG1pJJo0Iqo.

2020

pdf
Understanding Transformers for Information Extraction with Limited Data
Minh-Tien Nguyen | Dung Tien Le | Nguyen Hong Son | Bui Cong Minh | Do Hoang Thai Duong | Le Thai Linh
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

2018

pdf
TSix: A Human-involved-creation Dataset for Tweet Summarization
Minh-Tien Nguyen | Dac Viet Lai | Huy-Tien Nguyen | Le-Minh Nguyen
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf
VSoLSCSum: Building a Vietnamese Sentence-Comment Dataset for Social Context Summarization
Minh-Tien Nguyen | Dac Viet Lai | Phong-Khac Do | Duc-Vu Tran | Minh-Le Nguyen
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)

This paper presents VSoLSCSum, a Vietnamese linked sentence-comment dataset, which was manually created to treat the lack of standard corpora for social context summarization in Vietnamese. The dataset was collected through the keywords of 141 Web documents in 12 special events, which were mentioned on Vietnamese Web pages. Social users were asked to involve in creating standard summaries and the label of each sentence or comment. The inter-agreement calculated by Cohen’s Kappa among raters after validating is 0.685. To illustrate the potential use of our dataset, a learning to rank method was trained by using a set of local and social features. Experimental results indicate that the summary model trained on our dataset outperforms state-of-the-art baselines in both ROUGE-1 and ROUGE-2 in social context summarization.