Contrastive Distant Supervision for Debiased and Denoised Machine Reading Comprehension

Ning Bian; Hongyu Lin; Xianpei Han; Ben He; Le Sun

doi:10.18653/v1/2023.findings-emnlp.457

Contrastive Distant Supervision for Debiased and Denoised Machine Reading Comprehension

Ning Bian, Hongyu Lin, Xianpei Han, Ben He, Le Sun

Abstract

Distant Supervision (DS) is a promising learning approach for MRC by leveraging easily-obtained question-answer pairs. Unfortunately, the heuristically annotated dataset will inevitably lead to mislabeled instances, resulting in answer bias and context noise problems. To learn debiased and denoised MRC models, this paper proposes the Contrastive Distant Supervision algorithm – CDS, which can learn to distinguish confusing and noisy instances via confidence-aware contrastive learning. Specifically, to eliminate answer bias, CDS samples counterfactual negative instances, which ensures that MRC models must take both answer information and question-context interaction into consideration. To denoise distantly annotated contexts, CDS samples confusing negative instances to increase the margin between correct and mislabeled instances. We further propose a confidence-aware contrastive loss to model and leverage the uncertainty of all DS instances during learning. Experimental results show that CDS is effective and can even outperform supervised MRC models without manual annotations.

Anthology ID:: 2023.findings-emnlp.457
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2023
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6852–6863
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2023.findings-emnlp.457/
DOI:: 10.18653/v1/2023.findings-emnlp.457
Bibkey:
Cite (ACL):: Ning Bian, Hongyu Lin, Xianpei Han, Ben He, and Le Sun. 2023. Contrastive Distant Supervision for Debiased and Denoised Machine Reading Comprehension. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 6852–6863, Singapore. Association for Computational Linguistics.
Cite (Informal):: Contrastive Distant Supervision for Debiased and Denoised Machine Reading Comprehension (Bian et al., Findings 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2023.findings-emnlp.457.pdf

PDF Cite Search Fix data