Abstract
Arabic diacritization is a fundamental task for Arabic language processing. Previous studies have demonstrated that automatically generated knowledge can be helpful to this task. However, these studies regard the auto-generated knowledge instances as gold references, which limits their effectiveness since such knowledge is not always accurate and inferior instances can lead to incorrect predictions. In this paper, we propose to use regularized decoding and adversarial training to appropriately learn from such noisy knowledge for diacritization. Experimental results on two benchmark datasets show that, even with quite flawed auto-generated knowledge, our model can still learn adequate diacritics and outperform all previous studies, on both datasets.- Anthology ID:
- 2021.acl-short.68
- Volume:
- Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Editors:
- Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
- Venues:
- ACL | IJCNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 534–542
- Language:
- URL:
- https://aclanthology.org/2021.acl-short.68
- DOI:
- 10.18653/v1/2021.acl-short.68
- Cite (ACL):
- Han Qin, Guimin Chen, Yuanhe Tian, and Yan Song. 2021. Improving Arabic Diacritization with Regularized Decoding and Adversarial Training. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 534–542, Online. Association for Computational Linguistics.
- Cite (Informal):
- Improving Arabic Diacritization with Regularized Decoding and Adversarial Training (Qin et al., ACL-IJCNLP 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2021.acl-short.68.pdf
- Code
- synlp/AD-RDAT
- Data
- Arabic Text Diacritization