Xinnuo Xu


2021

pdf bib
MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization
Xinnuo Xu | Ondřej Dušek | Shashi Narayan | Verena Rieser | Ioannis Konstas
Findings of the Association for Computational Linguistics: EMNLP 2021

One of the most challenging aspects of current single-document news summarization is that the summary often contains ‘extrinsic hallucinations’, i.e., facts that are not present in the source document, which are often derived via world knowledge. This causes summarisation systems to act more like open-ended language models tending to hallucinate facts that are erroneous. In this paper, we mitigate this problem with the help of multiple supplementary resource documents assisting the task. We present a new dataset MiraNews and benchmark existing summarisation models. In contrast to multi-document summarization, which addresses multiple events from several source documents, we still aim at generating a summary for a single document. We show via data analysis that it’s not only the models which are to blame: more than 27% of facts mentioned in the gold summaries of MiraNews are better grounded on assisting documents than in the main source articles. An error analysis of generated summaries from pretrained models fine-tuned on MIRANEWS reveals that this has an even bigger effects on models: assisted summarisation reduces 55% of hallucinations when compared to single-document summarisation models trained on the main article only.

pdf bib
AugNLG: Few-shot Natural Language Generation using Self-trained Data Augmentation
Xinnuo Xu | Guoyin Wang | Young-Bum Kim | Sungjin Lee
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Natural Language Generation (NLG) is a key component in a task-oriented dialogue system, which converts the structured meaning representation (MR) to the natural language. For large-scale conversational systems, where it is common to have over hundreds of intents and thousands of slots, neither template-based approaches nor model-based approaches are scalable. Recently, neural NLGs started leveraging transfer learning and showed promising results in few-shot settings. This paper proposes AugNLG, a novel data augmentation approach that combines a self-trained neural retrieval model with a few-shot learned NLU model, to automatically create MR-to-Text data from open-domain texts. The proposed system mostly outperforms the state-of-the-art methods on the FewshotWOZ data in both BLEU and Slot Error Rate. We further confirm improved results on the FewshotSGD data and provide comprehensive analysis results on key components of our system. Our code and data are available at https://github.com/XinnuoXu/AugNLG.

pdf bib
AggGen: Ordering and Aggregating while Generating
Xinnuo Xu | Ondřej Dušek | Verena Rieser | Ioannis Konstas
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We present AggGen (pronounced ‘again’) a data-to-text model which re-introduces two explicit sentence planning stages into neural data-to-text systems: input ordering and input aggregation. In contrast to previous work using sentence planning, our model is still end-to-end: AggGen performs sentence planning at the same time as generating text by learning latent alignments (via semantic facts) between input representation and target text. Experiments on the WebNLG and E2E challenge data show that by using fact-based alignments our approach is more interpretable, expressive, robust to noise, and easier to control, while retaining the advantages of end-to-end systems in terms of fluency. Our code is available at https://github.com/XinnuoXu/AggGen.

2020

pdf bib
Fact-based Content Weighting for Evaluating Abstractive Summarisation
Xinnuo Xu | Ondřej Dušek | Jingyi Li | Verena Rieser | Ioannis Konstas
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Abstractive summarisation is notoriously hard to evaluate since standard word-overlap-based metrics are insufficient. We introduce a new evaluation metric which is based on fact-level content weighting, i.e. relating the facts of the document to the facts of the summary. We fol- low the assumption that a good summary will reflect all relevant facts, i.e. the ones present in the ground truth (human-generated refer- ence summary). We confirm this hypothe- sis by showing that our weightings are highly correlated to human perception and compare favourably to the recent manual highlight- based metric of Hardy et al. (2019).

2019

pdf bib
Unsupervised Dialogue Spectrum Generation for Log Dialogue Ranking
Xinnuo Xu | Yizhe Zhang | Lars Liden | Sungjin Lee
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Although the data-driven approaches of some recent bot building platforms make it possible for a wide range of users to easily create dialogue systems, those platforms don’t offer tools for quickly identifying which log dialogues contain problems. This is important since corrections to log dialogues provide a means to improve performance after deployment. A log dialogue ranker, which ranks problematic dialogues higher, is an essential tool due to the sheer volume of log dialogues that could be generated. However, training a ranker typically requires labelling a substantial amount of data, which is not feasible for most users. In this paper, we present a novel unsupervised approach for dialogue ranking using GANs and release a corpus of labelled dialogues for evaluation and comparison with supervised methods. The evaluation result shows that our method compares favorably to supervised methods without any labelled data.

2018

pdf bib
Better Conversations by Modeling, Filtering, and Optimizing for Coherence and Diversity
Xinnuo Xu | Ondřej Dušek | Ioannis Konstas | Verena Rieser
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We present three enhancements to existing encoder-decoder models for open-domain conversational agents, aimed at effectively modeling coherence and promoting output diversity: (1) We introduce a measure of coherence as the GloVe embedding similarity between the dialogue context and the generated response, (2) we filter our training corpora based on the measure of coherence to obtain topically coherent and lexically diverse context-response pairs, (3) we then train a response generator using a conditional variational autoencoder model that incorporates the measure of coherence as a latent variable and uses a context gate to guarantee topical consistency with the context and promote lexical diversity. Experiments on the OpenSubtitles corpus show a substantial improvement over competitive neural models in terms of BLEU score as well as metrics of coherence and diversity.