Jianfeng Qu
2022
ClidSum: A Benchmark Dataset for Cross-Lingual Dialogue Summarization
Jiaan Wang
|
Fandong Meng
|
Ziyao Lu
|
Duo Zheng
|
Zhixu Li
|
Jianfeng Qu
|
Jie Zhou
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
We present ClidSum, a benchmark dataset towards building cross-lingual summarization systems on dialogue documents. It consists of 67k+ dialogue documents and 112k+ annotated summaries in different target languages. Based on the proposed ClidSum, we introduce two benchmark settings for supervised and semi-supervised scenarios, respectively. We then build various baseline systems in different paradigms (pipeline and end-to-end) and conduct extensive experiments on ClidSum to provide deeper analyses. Furthermore, we propose mDialBART which extends mBART via further pre-training, where the multiple objectives help the pre-trained model capture the structural characteristics as well as key content in dialogues and the transformation from source to the target language. Experimental results show the superiority of mDialBART, as an end-to-end model, outperforms strong pipeline models on ClidSum. Finally, we discuss specific challenges that current approaches faced with this task and give multiple promising directions for future research. We have released the dataset and code at https://github.com/krystalan/ClidSum.
2021
Tackling Zero Pronoun Resolution and Non-Zero Coreference Resolution Jointly
Shisong Chen
|
Binbin Gu
|
Jianfeng Qu
|
Zhixu Li
|
An Liu
|
Lei Zhao
|
Zhigang Chen
Proceedings of the 25th Conference on Computational Natural Language Learning
Zero pronoun resolution aims at recognizing dropped pronouns and pointing out their anaphoric mentions, while non-zero coreference resolution targets at clustering mentions referring to the same entity. Existing efforts often deal with the two problems separately regardless of their close essential correlations. In this paper, we investigate the possibility of jointly solving zero pronoun resolution and coreference resolution via a novel end-to-end neural model. Specifically, we design a gap-masked self-attention model that encodes gaps and tokens in the same space, where gaps could capture valuable contextual information according to their surrounding tokens while tokens could maintain original sequential information without disturbance. Additionally, we also propose a two-stage interaction mechanism to make full use of the exclusive relationship between zero pronouns and mentions. Our empirical study conducted on the OntoNotes 5.0 Chinese dataset shows that our model could outperform corresponding state-of-the-art approaches on both tasks.