Sub-character Neural Language Modelling in Japanese

Viet Nguyen; Julian Brooke; Timothy Baldwin

doi:10.18653/v1/W17-4122

Sub-character Neural Language Modelling in Japanese

Viet Nguyen, Julian Brooke, Timothy Baldwin

Abstract

In East Asian languages such as Japanese and Chinese, the semantics of a character are (somewhat) reflected in its sub-character elements. This paper examines the effect of using sub-characters for language modeling in Japanese. This is achieved by decomposing characters according to a range of character decomposition datasets, and training a neural language model over variously decomposed character representations. Our results indicate that language modelling can be improved through the inclusion of sub-characters, though this result depends on a good choice of decomposition dataset and the appropriate granularity of decomposition.

Anthology ID:: W17-4122
Volume:: Proceedings of the First Workshop on Subword and Character Level Models in NLP
Month:: September
Year:: 2017
Address:: Copenhagen, Denmark
Editors:: Manaal Faruqui, Hinrich Schuetze, Isabel Trancoso, Yadollah Yaghoobzadeh
Venue:: SCLeM
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 148–153
Language:
URL:: https://preview.aclanthology.org/nschneid-patch-2/W17-4122/
DOI:: 10.18653/v1/W17-4122
Bibkey:
Cite (ACL):: Viet Nguyen, Julian Brooke, and Timothy Baldwin. 2017. Sub-character Neural Language Modelling in Japanese. In Proceedings of the First Workshop on Subword and Character Level Models in NLP, pages 148–153, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):: Sub-character Neural Language Modelling in Japanese (Nguyen et al., SCLeM 2017)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-2/W17-4122.pdf

PDF Cite Search Fix data