A Method for Building a Commonsense Inference Dataset based on Basic Events

Kazumasa Omura, Daisuke Kawahara, Sadao Kurohashi


Abstract
We present a scalable, low-bias, and low-cost method for building a commonsense inference dataset that combines automatic extraction from a corpus and crowdsourcing. Each problem is a multiple-choice question that asks contingency between basic events. We applied the proposed method to a Japanese corpus and acquired 104k problems. While humans can solve the resulting problems with high accuracy (88.9%), the accuracy of a high-performance transfer learning model is reasonably low (76.0%). We also confirmed through dataset analysis that the resulting dataset contains low bias. We released the datatset to facilitate language understanding research.
Anthology ID:
2020.emnlp-main.192
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2450–2460
Language:
URL:
https://aclanthology.org/2020.emnlp-main.192
DOI:
10.18653/v1/2020.emnlp-main.192
Bibkey:
Cite (ACL):
Kazumasa Omura, Daisuke Kawahara, and Sadao Kurohashi. 2020. A Method for Building a Commonsense Inference Dataset based on Basic Events. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2450–2460, Online. Association for Computational Linguistics.
Cite (Informal):
A Method for Building a Commonsense Inference Dataset based on Basic Events (Omura et al., EMNLP 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/2020.emnlp-main.192.pdf
Video:
 https://slideslive.com/38939260