Towards L2-friendly pipelines for learner corpora: A case of written production by L2-Korean learners

Hakyung Sung; Gyu-Ho Shin

doi:10.18653/v1/2023.bea-1.6

Towards L2-friendly pipelines for learner corpora: A case of written production by L2-Korean learners

Abstract

We introduce the Korean-Learner-Morpheme (KLM) corpus, a manually annotated dataset consisting of 129,784 morphemes from second language (L2) learners of Korean, featuring morpheme tokenization and part-of-speech (POS) tagging. We evaluate the performance of four Korean morphological analyzers in tokenization and POS tagging on the L2- Korean corpus. Results highlight the analyzers’ reduced performance on L2 data, indicating the limitation of advanced deep-learning models when dealing with L2-Korean corpora. We further show that fine-tuning one of the models with the KLM corpus improves its accuracy of tokenization and POS tagging on L2-Korean dataset.

Anthology ID:: 2023.bea-1.6
Volume:: Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Ekaterina Kochmar, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Nitin Madnani, Anaïs Tack, Victoria Yaneva, Zheng Yuan, Torsten Zesch
Venue:: BEA
SIG:: SIGEDU
Publisher:: Association for Computational Linguistics
Note:
Pages:: 72–82
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2023.bea-1.6/
DOI:: 10.18653/v1/2023.bea-1.6
Bibkey:
Cite (ACL):: Hakyung Sung and Gyu-Ho Shin. 2023. Towards L2-friendly pipelines for learner corpora: A case of written production by L2-Korean learners. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 72–82, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Towards L2-friendly pipelines for learner corpora: A case of written production by L2-Korean learners (Sung & Shin, BEA 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2023.bea-1.6.pdf

PDF Cite Search Fix data