Jie Mei


2021

pdf
MediaSum: A Large-scale Media Interview Dataset for Dialogue Summarization
Chenguang Zhu | Yang Liu | Jie Mei | Michael Zeng
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

This paper introduces MediaSum, a large-scale media interview dataset consisting of 463.6K transcripts with abstractive summaries. To create this dataset, we collect interview transcripts from NPR and CNN and employ the overview and topic descriptions as summaries. Compared with existing public corpora for dialogue summarization, our dataset is an order of magnitude larger and contains complex multi-party conversations from multiple domains. We conduct statistical analysis to demonstrate the unique positional bias exhibited in the transcripts of televised and radioed interviews. We also show that MediaSum can be used in transfer learning to improve a model’s performance on other dialogue summarization tasks.

2016

pdf
DalGTM at SemEval-2016 Task 1: Importance-Aware Compositional Approach to Short Text Similarity
Jie Mei | Aminul Islam | Evangelos Milios
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)