Han Jiang


2023

pdf
Large-Scale and Multi-Perspective Opinion Summarization with Diverse Review Subsets
Han Jiang | Rui Wang | Zhihua Wei | Yu Li | Xinpeng Wang
Findings of the Association for Computational Linguistics: EMNLP 2023

Opinion summarization is expected to digest larger review sets and provide summaries from different perspectives. However, most existing solutions are deficient in epitomizing extensive reviews and offering opinion summaries from various angles due to the lack of designs for information selection. To this end, we propose SubSumm, a supervised summarization framework for large-scale multi-perspective opinion summarization. SubSumm consists of a review sampling strategy set and a two-stage training scheme. The sampling strategies take sentiment orientation and contrastive information value into consideration, with which the review subsets from different perspectives and quality levels can be selected. Subsequently, the summarizer is encouraged to learn from the sub-optimal and optimal subsets successively in order to capitalize on the massive input. Experimental results on AmaSum and Rotten Tomatoes datasets demonstrate that SubSumm is adept at generating pros, cons, and verdict summaries from hundreds of input reviews. Furthermore, our in-depth analysis verifies that the advanced selection of review subsets and the two-stage training scheme are vital to boosting the summarization performance.

pdf
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
Xinpeng Wang | Xiaoyuan Yi | Han Jiang | Shanlin Zhou | Zhihua Wei | Xing Xie
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Recent large-scale Visual-Language Generative Models (VLGMs) have achieved unprecedented improvement in multimodal image/text generation. However, these models might also generate toxic content, e.g., offensive text and pornography images, raising significant ethical risks. Despite exhaustive studies on toxic degeneration of language models, this problem remains largely unexplored within the context of visual-language generation. This work delves into the propensity for toxicity generation and susceptibility to toxic data across various VLGMs. For this purpose, we built ToViLaG, a dataset comprising 32K co-toxic/mono-toxic text-image pairs and 1K innocuous but evocative text that tends to stimulate toxicity. Furthermore, we propose WInToRe, a novel toxicity metric tailored to visual-language generation, which theoretically reflects different aspects of toxicity considering both input and output. On such a basis, we benchmarked the toxicity of a diverse spectrum of VLGMs and discovered that some models do more evil than expected while some are more vulnerable to infection, underscoring the necessity of VLGMs detoxification. Therefore, we develop an innovative bottleneck-based detoxification method. Our method could reduce toxicity while maintaining comparable generation quality, providing a promising initial solution to this line of research.

2022

pdf
CHAE: Fine-Grained Controllable Story Generation with Characters, Actions and Emotions
Xinpeng Wang | Han Jiang | Zhihua Wei | Shanlin Zhou
Proceedings of the 29th International Conference on Computational Linguistics

Story generation has emerged as an interesting yet challenging NLP task in recent years. Some existing studies aim at generating fluent and coherent stories from keywords and outlines; while others attempt to control the global features of the story, such as emotion, style and topic. However, these works focus on coarse-grained control on the story, neglecting control on the details of the story, which is also crucial for the task. To fill the gap, this paper proposes a model for fine-grained control on the story, which allows the generation of customized stories with characters, corresponding actions and emotions arbitrarily assigned. Extensive experimental results on both automatic and human manual evaluations show the superiority of our method. It has strong controllability to generate stories according to the fine-grained personalized guidance, unveiling the effectiveness of our methodology. Our code is available at https://github.com/victorup/CHAE.

2013

pdf
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval
Rui Yan | Han Jiang | Mirella Lapata | Shou-De Lin | Xueqiang Lv | Xiaoming Li
Proceedings of the Sixth International Joint Conference on Natural Language Processing