Korean Language Modeling via Syntactic Guide

Hyeondey Kim, Seonhoon Kim, Inho Kang, Nojun Kwak, Pascale Fung


Abstract
While pre-trained language models play a vital role in modern language processing tasks, but not every language can benefit from them. Most existing research on pre-trained language models focuses primarily on widely-used languages such as English, Chinese, and Indo-European languages. Additionally, such schemes usually require extensive computational resources alongside a large amount of data, which is infeasible for less-widely used languages. We aim to address this research niche by building a language model that understands the linguistic phenomena in the target language which can be trained with low-resources. In this paper, we discuss Korean language modeling, specifically methods for language representation and pre-training methods. With our Korean-specific language representation, we are able to build more powerful language models for Korean understanding, even with fewer resources. The paper proposes chunk-wise reconstruction of the Korean language based on a widely used transformer architecture and bidirectional language representation. We also introduce morphological features such as Part-of-Speech (PoS) into the language understanding by leveraging such information during the pre-training. Our experiment results prove that the proposed methods improve the model performance of the investigated Korean language understanding tasks.
Anthology ID:
2022.lrec-1.304
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2841–2849
Language:
URL:
https://aclanthology.org/2022.lrec-1.304
DOI:
Bibkey:
Cite (ACL):
Hyeondey Kim, Seonhoon Kim, Inho Kang, Nojun Kwak, and Pascale Fung. 2022. Korean Language Modeling via Syntactic Guide. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2841–2849, Marseille, France. European Language Resources Association.
Cite (Informal):
Korean Language Modeling via Syntactic Guide (Kim et al., LREC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-2023-videos/2022.lrec-1.304.pdf
Data
KorNLI