Diversifying language models for lesser-studied languages and language-usage contexts: A case of second language Korean

Hakyung Sung; Gyu-Ho Shin

doi:10.18653/v1/2023.findings-emnlp.767

Diversifying language models for lesser-studied languages and language-usage contexts: A case of second language Korean

Abstract

This study investigates the extent to which currently available morpheme parsers/taggers apply to lesser-studied languages and language-usage contexts, with a focus on second language (L2) Korean. We pursue this inquiry by (1) training a neural-network model (pre-trained on first language [L1] Korean data) on varying L2 datasets and (2) measuring its morpheme parsing/POS tagging performance on L2 test sets from both the same and different sources of the L2 train sets. Results show that the L2 trained models generally excel in domain-specific tokenization and POS tagging compared to the L1 pre-trained baseline model. Interestingly, increasing the size of the L2 training data does not lead to improving model performance consistently.

Anthology ID:: 2023.findings-emnlp.767
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2023
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11461–11473
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2023.findings-emnlp.767/
DOI:: 10.18653/v1/2023.findings-emnlp.767
Bibkey:
Cite (ACL):: Hakyung Sung and Gyu-Ho Shin. 2023. Diversifying language models for lesser-studied languages and language-usage contexts: A case of second language Korean. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 11461–11473, Singapore. Association for Computational Linguistics.
Cite (Informal):: Diversifying language models for lesser-studied languages and language-usage contexts: A case of second language Korean (Sung & Shin, Findings 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2023.findings-emnlp.767.pdf

PDF Cite Search Fix data