Viet Nguyen

2017

pdf bib abs
Sub-character Neural Language Modelling in Japanese
Viet Nguyen | Julian Brooke | Timothy Baldwin
Proceedings of the First Workshop on Subword and Character Level Models in NLP

In East Asian languages such as Japanese and Chinese, the semantics of a character are (somewhat) reflected in its sub-character elements. This paper examines the effect of using sub-characters for language modeling in Japanese. This is achieved by decomposing characters according to a range of character decomposition datasets, and training a neural language model over variously decomposed character representations. Our results indicate that language modelling can be improved through the inclusion of sub-characters, though this result depends on a good choice of decomposition dataset and the appropriate granularity of decomposition.

Co-authors

Timothy Baldwin 1
Julian Brooke 1

Venues

sclem1

Fix data

Viet Nguyen

Fixing paper assignments

2017

Co-authors

Venues