MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization

Canwen Xu; Jiaxin Pei; Hongtao Wu; Yiyu Liu; Chenliang Li

doi:10.18653/v1/2020.acl-main.330

MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization

Canwen Xu, Jiaxin Pei, Hongtao Wu, Yiyu Liu, Chenliang Li

Abstract

Recently, large-scale datasets have vastly facilitated the development in nearly all domains of Natural Language Processing. However, there is currently no cross-task dataset in NLP, which hinders the development of multi-task learning. We propose MATINF, the first jointly labeled large-scale dataset for classification, question answering and summarization. MATINF contains 1.07 million question-answer pairs with human-labeled categories and user-generated question descriptions. Based on such rich information, MATINF is applicable for three major NLP tasks, including classification, question answering, and summarization. We benchmark existing methods and a novel multi-task baseline over MATINF to inspire further research. Our comprehensive comparison and experiments over MATINF and other datasets demonstrate the merits held by MATINF.

Anthology ID:: 2020.acl-main.330
Volume:: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:: July
Year:: 2020
Address:: Online
Editors:: Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3586–3596
Language:
URL:: https://aclanthology.org/2020.acl-main.330
DOI:: 10.18653/v1/2020.acl-main.330
Bibkey:
Cite (ACL):: Canwen Xu, Jiaxin Pei, Hongtao Wu, Yiyu Liu, and Chenliang Li. 2020. MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3586–3596, Online. Association for Computational Linguistics.
Cite (Informal):: MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization (Xu et al., ACL 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-bitext-workshop/2020.acl-main.330.pdf
Video:: http://slideslive.com/38928859
Code: WHUIR/MATINF
Data: MATINF, AG News, DuReader, LCSTS, MS MARCO, NEWSROOM

PDF Search Code Video