Abstract
We present our deep leaning system submitted to MADAR shared task 2 focused on twitter user dialect identification. We develop tweet-level identification models based on GRUs and BERT in supervised and semi-supervised set-tings. We then introduce a simple, yet effective, method of porting tweet-level labels at the level of users. Our system ranks top 1 in the competition, with 71.70% macro F1 score and 77.40% accuracy.- Anthology ID:
- W19-4637
- Volume:
- Proceedings of the Fourth Arabic Natural Language Processing Workshop
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Venue:
- WANLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 279–284
- Language:
- URL:
- https://aclanthology.org/W19-4637
- DOI:
- 10.18653/v1/W19-4637
- Cite (ACL):
- Chiyu Zhang and Muhammad Abdul-Mageed. 2019. No Army, No Navy: BERT Semi-Supervised Learning of Arabic Dialects. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 279–284, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- No Army, No Navy: BERT Semi-Supervised Learning of Arabic Dialects (Zhang & Abdul-Mageed, WANLP 2019)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/W19-4637.pdf