Li-Hsin Chang


2021

pdf bib
Fine-grained Named Entity Annotation for Finnish
Jouni Luoma | Li-Hsin Chang | Filip Ginter | Sampo Pyysalo
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

We introduce a corpus with fine-grained named entity annotation for Finnish, following the OntoNotes guidelines to create a resource that is cross-lingually compatible with existing annotations for other languages. We combine and extend two NER corpora recently introduced for Finnish and revise their custom annotation scheme through a combination of automatic and manual processing steps. The resulting corpus consists of nearly 500,000 tokens annotated for over 50,000 mentions categorized into the 18 OntoNotes name and numeric entity types. We evaluate this resource and demonstrate its compatibility with the English OntoNotes annotations by training state-of-the-art mono-, bi- and multilingual deep learning models, finding both that the corpus allows highly accurate recognition of OntoNotes types at 93% F-score and that a comparable level of tagging accuracy can be achieved by a bilingual Finnish-English NER model.

pdf bib
Finnish Paraphrase Corpus
Jenna Kanerva | Filip Ginter | Li-Hsin Chang | Iiro Rastas | Valtteri Skantsi | Jemina Kilpeläinen | Hanna-Mari Kupari | Jenna Saarni | Maija Sevón | Otto Tarkka
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

In this paper, we introduce the first fully manually annotated paraphrase corpus for Finnish containing 53,572 paraphrase pairs harvested from alternative subtitles and news headings. Out of all paraphrase pairs in our corpus 98% are manually classified to be paraphrases at least in their given context, if not in all contexts. Additionally, we establish a manual candidate selection method and demonstrate its feasibility in high quality paraphrase selection in terms of both cost and quality.

pdf bib
Quantitative Evaluation of Alternative Translations in a Corpus of Highly Dissimilar Finnish Paraphrases
Li-Hsin Chang | Sampo Pyysalo | Jenna Kanerva | Filip Ginter
Proceedings for the First Workshop on Modelling Translation: Translatology in the Digital Age