Seunghyun Lim


2021

pdf
Analysis of Zero-Shot Crosslingual Learning between English and Korean for Named Entity Recognition
Jongin Kim | Nayoung Choi | Seunghyun Lim | Jungwhan Kim | Soojin Chung | Hyunsoo Woo | Min Song | Jinho D. Choi
Proceedings of the 1st Workshop on Multilingual Representation Learning

This paper presents a English-Korean parallel dataset that collects 381K news articles where 1,400 of them, comprising 10K sentences, are manually labeled for crosslingual named entity recognition (NER). The annotation guidelines for the two languages are developed in parallel, that yield the inter-annotator agreement scores of 91 and 88% for English and Korean respectively, indicating sublime quality annotation in our dataset. Three types of crosslingual learning approaches, direct model transfer, embedding projection, and annotation projection, are used to develop zero-shot Korean NER models. Our best model gives the F1-score of 51% that is very encouraging, considering the extremely distinct natures of these two languages. This is pioneering work that explores zero-shot cross-lingual learning between English and Korean and provides rich parallel annotation for a core NLP task such as named entity recognition.

pdf
Papago’s Submission for the WMT21 Quality Estimation Shared Task
Seunghyun Lim | Hantae Kim | Hyunjoong Kim
Proceedings of the Sixth Conference on Machine Translation

This paper describes Papago submission to the WMT 2021 Quality Estimation Task 1: Sentence-level Direct Assessment. Our multilingual Quality Estimation system explores the combination of Pretrained Language Models and Multi-task Learning architectures. We propose an iterative training pipeline based on pretraining with large amounts of in-domain synthetic data and finetuning with gold (labeled) data. We then compress our system via knowledge distillation in order to reduce parameters yet maintain strong performance. Our submitted multilingual systems perform competitively in multilingual and all 11 individual language pair settings including zero-shot.