Bo Dong


2022

pdf
CML: A Contrastive Meta Learning Method to Estimate Human Label Confidence Scores and Reduce Data Collection Cost
Bo Dong | Yiyi Wang | Hanbo Sun | Yunji Wang | Alireza Hashemi | Zheng Du
Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5)

Deep neural network models are especially susceptible to noise in annotated labels. In the real world, annotated data typically contains noise caused by a variety of factors such as task difficulty, annotator experience, and annotator bias. Label quality is critical for label validation tasks; however, correcting for noise by collecting more data is often costly. In this paper, we propose a contrastive meta-learning framework (CML) to address the challenges introduced by noisy annotated data, specifically in the context of natural language processing. CML combines contrastive and meta learning to improve the quality of text feature representations. Meta-learning is also used to generate confidence scores to assess label quality. We demonstrate that a model built on CML-filtered data outperforms a model built on clean data. Furthermore, we perform experiments on deidentified commercial voice assistant datasets and demonstrate that our model outperforms several SOTA approaches.

2020

pdf
基于有向异构图的发票明细税收分类方法(Tax Classification of Invoice Details Based on Directed Heterogeneous Graph)
Peiyao Zhao (赵珮瑶) | Qinghua Zheng (郑庆华) | Bo Dong (董博) | Jianfei Ruan (阮建飞) | Minnan Luo (罗敏楠)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

税收是国家赖以生存的物质基础。为加快税收现代化,方便纳税人便捷、规范开具增值税发票,国税总局规定纳税人在税控系统开票前选择发票明细对应的税收分类才可正常开具发票。提高税收分类的准确度,是构建税收风险指标和分析纳税人行为特征的重要基础。基于此,本文提出了一种基于有向异构图的短文本分类模型(Heterogeneous Directed Graph Attenton Network,HDGAT),利用发票明细间的有向信息建模,引入外部知识,显著地提高了发票明细的税收分类准确度。