Timen Stepišnik-Perdih


2022

pdf
Sentiment Classification by Incorporating Background Knowledge from Financial Ontologies
Timen Stepišnik-Perdih | Andraž Pelicon | Blaž Škrlj | Martin Žnidaršič | Igor Lončarski | Senja Pollak
Proceedings of the 4th Financial Narrative Processing Workshop @LREC2022

Ontologies are increasingly used for machine reasoning over the last few years. They can provide explanations of concepts or be used for concept classification if there exists a mapping from the desired labels to the relevant ontology. This paper presents a practical use of an ontology for the purpose of data set generalization in an oversampling setting, with the aim of improving classification models. We demonstrate our solution on a novel financial sentiment data set using the Financial Industry Business Ontology (FIBO). The results show that generalization-based data enrichment benefits simpler models in a general setting and more complex models such as BERT in low-data setting.

2021

pdf
Interesting cross-border news discovery using cross-lingual article linking and document similarity
Boshko Koloski | Elaine Zosa | Timen Stepišnik-Perdih | Blaž Škrlj | Tarmo Paju | Senja Pollak
Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

Team Name: team-8 Embeddia Tool: Cross-Lingual Document Retrieval Zosa et al. Dataset: Estonian and Latvian news datasets abstract: Contemporary news media face increasing amounts of available data that can be of use when prioritizing, selecting and discovering new news. In this work we propose a methodology for retrieving interesting articles in a cross-border news discovery setting. More specifically, we explore how a set of seed documents in Estonian can be projected in Latvian document space and serve as a basis for discovery of novel interesting pieces of Latvian news that would interest Estonian readers. The proposed methodology was evaluated by Estonian journalist who confirmed that in the best setting, from top 10 retrieved Latvian documents, half of them represent news that are potentially interesting to be taken by the Estonian media house and presented to Estonian readers.