Abstract
This paper describes our system designed for SemEval-2022 Task 8: Multilingual News Article Similarity. We proposed a linguistics-inspired model trained with a few task-specific strategies. The main techniques of our system are: 1) data augmentation, 2) multi-label loss, 3) adapted R-Drop, 4) samples reconstruction with the head-tail combination. We also present a brief analysis of some negative methods like two-tower architecture. Our system ranked 1st on the leaderboard while achieving a Pearson’s Correlation Coefficient of 0.818 on the official evaluation set.- Anthology ID:
- 2022.semeval-1.157
- Volume:
- Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, United States
- Venue:
- SemEval
- SIGs:
- SIGLEX | SIGSEM
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1114–1120
- Language:
- URL:
- https://aclanthology.org/2022.semeval-1.157
- DOI:
- 10.18653/v1/2022.semeval-1.157
- Cite (ACL):
- Zihang Xu, Ziqing Yang, Yiming Cui, and Zhigang Chen. 2022. HFL at SemEval-2022 Task 8: A Linguistics-inspired Regression Model with Data Augmentation for Multilingual News Similarity. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 1114–1120, Seattle, United States. Association for Computational Linguistics.
- Cite (Informal):
- HFL at SemEval-2022 Task 8: A Linguistics-inspired Regression Model with Data Augmentation for Multilingual News Similarity (Xu et al., SemEval 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.semeval-1.157.pdf
- Code
- geekdream-x/semeval2022-task8-tonyx