Xiaolan Wang


2022

pdf
Comparative Opinion Summarization via Collaborative Decoding
Hayate Iso | Xiaolan Wang | Stefanos Angelidis | Yoshihiko Suhara
Findings of the Association for Computational Linguistics: ACL 2022

Opinion summarization focuses on generating summaries that reflect popular subjective information expressed in multiple online reviews.While generated summaries offer general and concise information about a particular hotel or product, the information may be insufficient to help the user compare multiple different choices.Thus, the user may still struggle with the question “Which one should I pick?” In this paper, we propose the comparative opinion summarization task, which aims at generating two contrastive summaries and one common summary from two different candidate sets of reviews.We develop a comparative summarization framework CoCoSum, which consists of two base summarization models that jointly generate contrastive and common summaries.Experimental results on a newly created benchmark CoCoTrip show that CoCoSum can produce higher-quality contrastive and common summaries than state-of-the-art opinion summarization models.The dataset and code are available at https://github.com/megagonlabs/cocosum

pdf
Summarizing Community-based Question-Answer Pairs
Ting-Yao Hsu | Yoshi Suhara | Xiaolan Wang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Community-based Question Answering (CQA), which allows users to acquire their desired information, has increasingly become an essential component of online services in various domains such as E-commerce, travel, and dining. However, an overwhelming number of CQA pairs makes it difficult for users without particular intent to find useful information spread over CQA pairs. To help users quickly digest the key information, we propose the novel CQA summarization task that aims to create a concise summary from CQA pairs. To this end, we first design a multi-stage data annotation process and create a benchmark dataset, COQASUM, based on the Amazon QA corpus. We then compare a collection of extractive and abstractive summarization methods and establish a strong baseline approach DedupLED for the CQA summarization task. Our experiment further confirms two key challenges, sentence-type transfer and deduplication removal, towards the CQA summarization task. Our data and code are publicly available.

2021

pdf
Extractive Opinion Summarization in Quantized Transformer Spaces
Stefanos Angelidis | Reinald Kim Amplayo | Yoshihiko Suhara | Xiaolan Wang | Mirella Lapata
Transactions of the Association for Computational Linguistics, Volume 9

Abstract We present the Quantized Transformer (QT), an unsupervised system for extractive opinion summarization. QT is inspired by Vector- Quantized Variational Autoencoders, which we repurpose for popularity-driven summarization. It uses a clustering interpretation of the quantized space and a novel extraction algorithm to discover popular opinions among hundreds of reviews, a significant step towards opinion summarization of practical scope. In addition, QT enables controllable summarization without further training, by utilizing properties of the quantized space to extract aspect-specific summaries. We also make publicly available Space, a large-scale evaluation benchmark for opinion summarizers, comprising general and aspect-specific summaries for 50 hotels. Experiments demonstrate the promise of our approach, which is validated by human studies where judges showed clear preference for our method over competitive baselines.

pdf
Convex Aggregation for Opinion Summarization
Hayate Iso | Xiaolan Wang | Yoshihiko Suhara | Stefanos Angelidis | Wang-Chiew Tan
Findings of the Association for Computational Linguistics: EMNLP 2021

Recent advances in text autoencoders have significantly improved the quality of the latent space, which enables models to generate grammatical and consistent text from aggregated latent vectors. As a successful application of this property, unsupervised opinion summarization models generate a summary by decoding the aggregated latent vectors of inputs. More specifically, they perform the aggregation via simple average. However, little is known about how the vector aggregation step affects the generation quality. In this study, we revisit the commonly used simple average approach by examining the latent space and generated summaries. We found that text autoencoders tend to generate overly generic summaries from simply averaged latent vectors due to an unexpected L2-norm shrinkage in the aggregated latent vectors, which we refer to as summary vector degeneration. To overcome this issue, we develop a framework Coop, which searches input combinations for the latent vector aggregation using input-output word overlap. Experimental results show that Coop successfully alleviates the summary vector degeneration issue and establishes new state-of-the-art performance on two opinion summarization benchmarks. Code is available at https://github.com/megagonlabs/coop.

2020

pdf
OpinionDigest: A Simple Framework for Opinion Summarization
Yoshihiko Suhara | Xiaolan Wang | Stefanos Angelidis | Wang-Chiew Tan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We present OpinionDigest, an abstractive opinion summarization framework, which does not rely on gold-standard summaries for training. The framework uses an Aspect-based Sentiment Analysis model to extract opinion phrases from reviews, and trains a Transformer model to reconstruct the original reviews from these extractions. At summarization time, we merge extractions from multiple reviews and select the most popular ones. The selected opinions are used as input to the trained Transformer model, which verbalizes them into an opinion summary. OpinionDigest can also generate customized summaries, tailored to specific user needs, by filtering the selected opinions according to their aspect and/or sentiment. Automatic evaluation on Yelp data shows that our framework outperforms competitive baselines. Human studies on two corpora verify that OpinionDigest produces informative summaries and shows promising customization capabilities.