Abstract
This paper describes the IUCL system at VarDial 2019 evaluation campaign for the task of discriminating between Mainland and Taiwan variation of mandarin Chinese. We first build several base classifiers, including a Naive Bayes classifier with word n-gram as features, SVMs with both character and syntactic features, and neural networks with pre-trained character/word embeddings. Then we adopt ensemble methods to combine output from base classifiers to make final predictions. Our ensemble models achieve the highest F1 score (0.893) in simplified Chinese track and the second highest (0.901) in traditional Chinese track. Our results demonstrate the effectiveness and robustness of the ensemble methods.- Anthology ID:
- W19-1417
- Volume:
- Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
- Month:
- June
- Year:
- 2019
- Address:
- Ann Arbor, Michigan
- Editors:
- Marcos Zampieri, Preslav Nakov, Shervin Malmasi, Nikola Ljubešić, Jörg Tiedemann, Ahmed Ali
- Venue:
- VarDial
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 165–171
- Language:
- URL:
- https://aclanthology.org/W19-1417
- DOI:
- 10.18653/v1/W19-1417
- Cite (ACL):
- Hai Hu, Wen Li, He Zhou, Zuoyu Tian, Yiwen Zhang, and Liang Zou. 2019. Ensemble Methods to Distinguish Mainland and Taiwan Chinese. In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 165–171, Ann Arbor, Michigan. Association for Computational Linguistics.
- Cite (Informal):
- Ensemble Methods to Distinguish Mainland and Taiwan Chinese (Hu et al., VarDial 2019)
- PDF:
- https://preview.aclanthology.org/ml4al-ingestion/W19-1417.pdf