Detection of Chinese Word Usage Errors for Non-Native Chinese Learners with Bidirectional LSTM

Yow-Ting Shiue, Hen-Hsen Huang, Hsin-Hsi Chen


Abstract
Selecting appropriate words to compose a sentence is one common problem faced by non-native Chinese learners. In this paper, we propose (bidirectional) LSTM sequence labeling models and explore various features to detect word usage errors in Chinese sentences. By combining CWINDOW word embedding features and POS information, the best bidirectional LSTM model achieves accuracy 0.5138 and MRR 0.6789 on the HSK dataset. For 80.79% of the test data, the model ranks the ground-truth within the top two at position level.
Anthology ID:
P17-2064
Volume:
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2017
Address:
Vancouver, Canada
Editors:
Regina Barzilay, Min-Yen Kan
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
404–410
Language:
URL:
https://aclanthology.org/P17-2064
DOI:
10.18653/v1/P17-2064
Bibkey:
Cite (ACL):
Yow-Ting Shiue, Hen-Hsen Huang, and Hsin-Hsi Chen. 2017. Detection of Chinese Word Usage Errors for Non-Native Chinese Learners with Bidirectional LSTM. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 404–410, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Detection of Chinese Word Usage Errors for Non-Native Chinese Learners with Bidirectional LSTM (Shiue et al., ACL 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/P17-2064.pdf
Dataset:
 P17-2064.Datasets.zip