Ji Yoon Han


2020

pdf
Analysis of the Penn Korean Universal Dependency Treebank (PKT-UD): Manual Revision to Build Robust Parsing Model in Korean
Tae Hwan Oh | Ji Yoon Han | Hyonsu Choe | Seokwon Park | Han He | Jinho D. Choi | Na-Rae Han | Jena D. Hwang | Hansaem Kim
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

In this paper, we first open on important issues regarding the Penn Korean Universal Treebank (PKT-UD) and address these issues by revising the entire corpus manually with the aim of producing cleaner UD annotations that are more faithful to Korean grammar. For compatibility to the rest of UD corpora, we follow the UDv2 guidelines, and extensively revise the part-of-speech tags and the dependency relations to reflect morphological features and flexible word- order aspects in Korean. The original and the revised versions of PKT-UD are experimented with transformer-based parsing models using biaffine attention. The parsing model trained on the revised corpus shows a significant improvement of 3.0% in labeled attachment score over the model trained on the previous corpus. Our error analysis demonstrates that this revision allows the parsing model to learn relations more robustly, reducing several critical errors that used to be made by the previous model.

pdf
Crowdsourcing in the Development of a Multilingual FrameNet: A Case Study of Korean FrameNet
Younggyun Hahm | Youngbin Noh | Ji Yoon Han | Tae Hwan Oh | Hyonsu Choe | Hansaem Kim | Key-Sun Choi
Proceedings of the Twelfth Language Resources and Evaluation Conference

Using current methods, the construction of multilingual resources in FrameNet is an expensive and complex task. While crowdsourcing is a viable alternative, it is difficult to include non-native English speakers in such efforts as they often have difficulty with English-based FrameNet tools. In this work, we investigated cross-lingual issues in crowdsourcing approaches for multilingual FrameNets, specifically in the context of the newly constructed Korean FrameNet. To accomplish this, we evaluated the effectiveness of various crowdsourcing settings whereby certain types of information are provided to workers, such as English definitions in FrameNet or translated definitions. We then evaluated whether the crowdsourced results accurately captured the meaning of frames both cross-culturally and cross-linguistically, and found that by allowing the crowd workers to make intuitive choices, they achieved a quality comparable to that of trained FrameNet experts (F1 > 0.75). The outcomes of this work are now publicly available as a new release of Korean FrameNet 1.1.

pdf
Annotation Issues in Universal Dependencies for Korean and Japanese
Ji Yoon Han | Tae Hwan Oh | Lee Jin | Hansaem Kim
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)

To investigate issues that arise in the process of developing a Universal Dependency (UD) treebank for Korean and Japanese, we begin by addressing the typological characteristics of Korean and Japanese. Both Korean and Japanese are agglutinative and head-final languages. And the principle of word segmentation for both languages is different from English, which makes it difficult to apply UD guidelines. Following the typological characteristics of the two languages and the issue of UD application, we review the application of UPOS and DEPREL schemes to the two languages. The annotation principles for AUX, ADJ, DET, ADP and PART are discussed for the UPOS scheme, and the annotation principles for case, aux, iobj, and obl are discussed for the DEPREL scheme.