Jan Kostkan
2024
Comparing Tools for Sentiment Analysis of Danish Literature from Hymns to Fairy Tales: Low-Resource Language and Domain Challenges
Pascale Feldkamp
|
Jan Kostkan
|
Ea Overgaard
|
Mia Jacobsen
|
Yuri Bizzoni
Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis
While Sentiment Analysis has become increasingly central in computational approaches to literary texts, the literary domain still poses important challenges for the detection of textual sentiment due to its highly complex use of language and devices - from subtle humor to poetic imagery. Furthermore these challenges are only further amplified in low-resource language and domain settings. In this paper we investigate the application and efficacy of different Sentiment Analysis tools on Danish literary texts, using historical fairy tales and religious hymns as our datasets. The scarcity of linguistic resources for Danish and the historical context of the data further compounds the challenges for the tools. We compare human annotations to the continuous valence scores of both transformer- and dictionary-based Sentiment Analysis methods to assess their performance, seeking to understand how distinct methods handle the language of Danish prose and poetry.
2023
OdyCy – A general-purpose NLP pipeline for Ancient Greek
Jan Kostkan
|
Márton Kardos
|
Jacob Palle Bliddal Mortensen
|
Kristoffer Laigaard Nielbo
Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
This paper presents a general-purpose NLP pipeline that achieves state-of-the-art performance on the Ancient Greek Perseus UD Treebank for several tasks (POS Tagging, Morphological Analysis and Dependency Parsing), and close to state-of-the-art performance on the Proiel UD Treebank. Our aim is to provide a reproducible, open source language processing pipeline for Ancient Greek, capable of handling input texts of varying quality. We measure the performance of our model against other comparable tools and then evaluate lemmatization errors.
Search