Rich Character-Level Information for Korean Morphological Analysis and Part-of-Speech Tagging

Andrew Matteson; Chanhee Lee; Youngbum Kim; Heui-Seok Lim

Rich Character-Level Information for Korean Morphological Analysis and Part-of-Speech Tagging

Andrew Matteson, Chanhee Lee, Youngbum Kim, Heuiseok Lim

Abstract

Due to the fact that Korean is a highly agglutinative, character-rich language, previous work on Korean morphological analysis typically employs the use of sub-character features known as graphemes or otherwise utilizes comprehensive prior linguistic knowledge (i.e., a dictionary of known morphological transformation forms, or actions). These models have been created with the assumption that character-level, dictionary-less morphological analysis was intractable due to the number of actions required. We present, in this study, a multi-stage action-based model that can perform morphological transformation and part-of-speech tagging using arbitrary units of input and apply it to the case of character-level Korean morphological analysis. Among models that do not employ prior linguistic knowledge, we achieve state-of-the-art word and sentence-level tagging accuracy with the Sejong Korean corpus using our proposed data-driven Bi-LSTM model.

Anthology ID:: C18-1210
Volume:: Proceedings of the 27th International Conference on Computational Linguistics
Month:: August
Year:: 2018
Address:: Santa Fe, New Mexico, USA
Editors:: Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2482–2492
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/C18-1210/
DOI:
Bibkey:
Cite (ACL):: Andrew Matteson, Chanhee Lee, Youngbum Kim, and Heuiseok Lim. 2018. Rich Character-Level Information for Korean Morphological Analysis and Part-of-Speech Tagging. In Proceedings of the 27th International Conference on Computational Linguistics, pages 2482–2492, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):: Rich Character-Level Information for Korean Morphological Analysis and Part-of-Speech Tagging (Matteson et al., COLING 2018)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/C18-1210.pdf

PDF Cite Search Fix data