Abstract
“Domain-specific text classification often needs more external knowledge, and fraud cases havefewer descriptions. Existing methods usually utilize single-stage deep models to extract semanticfeatures, which is less reusable. To tackle this issue, we propose a two-stage training frameworkbased on within-task pretraining and multi-dimensional semantic enhancement for CCL23-EvalTask 6 (Telecom Network Fraud Case Classification, FCC). Our training framework is dividedinto two stages. First, we pre-train using the training corpus to obtain specific BERT. The seman-tic mining ability of the model is enhanced from the feature space perspective by introducing ad-versarial training and multiple random sampling. The pseudo-labeled data is generated throughthe test data above a certain threshold. Second, pseudo-labeled samples are added to the trainingset for semantic enhancement based on the sample space dimension. We utilize the same back-bone for prediction to obtain the results. Experimental results show that our proposed methodoutperforms the single-stage benchmarks and achieves competitive performance with 0.859259F1. It also performs better in the few-shot patent classification task with 65.160% F1, whichindicates robustness.”- Anthology ID:
- 2023.ccl-3.23
- Volume:
- Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
- Month:
- August
- Year:
- 2023
- Address:
- Harbin, China
- Editors:
- Maosong Sun, Bing Qin, Xipeng Qiu, Jing Jiang, Xianpei Han
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 206–212
- Language:
- English
- URL:
- https://aclanthology.org/2023.ccl-3.23
- DOI:
- Cite (ACL):
- Guangyu Zheng, Tingting He, Zhenyu Wang, and Haochang Wang. 2023. System Report for CCL23-Eval Task 6: A Method For Telecom Network Fraud Case Classification Based on Two-stage Training Framework and Within-task Pretraining. In Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations), pages 206–212, Harbin, China. Chinese Information Processing Society of China.
- Cite (Informal):
- System Report for CCL23-Eval Task 6: A Method For Telecom Network Fraud Case Classification Based on Two-stage Training Framework and Within-task Pretraining (Zheng et al., CCL 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2023.ccl-3.23.pdf