Damir Korenčić

Also published as: Damir Korencic

2022

pdf abs
IRB-NLP at SemEval-2022 Task 1: Exploring the Relationship Between Words and Their Semantic Representations
Damir Korenčić | Ivan Grubisic
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

What is the relation between a word and its description, or a word and its embedding? Both descriptions and embeddings are semantic representations of words. But, what information from the original word remains in these representations? Or more importantly, which information about a word do these two representations share? Definition Modeling and Reverse Dictionary are two opposite learning tasks that address these questions. The goal of the Definition Modeling task is to investigate the power of information laying inside a word embedding to express the meaning of the word in a humanly understandable way – as a dictionary definition. Conversely, the Reverse Dictionary task explores the ability to predict word embeddings directly from its definition. In this paper, by tackling these two tasks, we are exploring the relationship between words and their semantic representations. We present our findings based on the descriptive, exploratory, and predictive data analysis conducted on the CODWOE dataset. We give a detailed overview of the systems that we designed for Definition Modeling and Reverse Dictionary tasks, and that achieved top scores on SemEval-2022 CODWOE challenge in several subtasks. We hope that our experimental results concerning the predictive models and the data analyses we provide will prove useful in future explorations of word representations and their relationships.

2021

pdf abs
To Block or not to Block: Experiments with Machine Learning for News Comment Moderation
Damir Korencic | Ipek Baris | Eugenia Fernandez | Katarina Leuschel | Eva Sánchez Salido
Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

Today, news media organizations regularly engage with readers by enabling them to comment on news articles. This creates the need for comment moderation and removal of disallowed comments – a time-consuming task often performed by human moderators. In this paper we approach the problem of automatic news comment moderation as classification of comments into blocked and not blocked categories. We construct a novel dataset of annotated English comments, experiment with cross-lingual transfer of comment labels and evaluate several machine learning models on datasets of Croatian and Estonian news comments. Team name: SuperAdmin; Challenge: Detection of blocked comments; Tools/models: CroSloEn BERT, FinEst BERT, 24Sata comment dataset, Ekspress comment dataset.

2013

pdf
Aspect-Oriented Opinion Mining from User Reviews in Croatian
Goran Glavaš | Damir Korenčić | Jan Šnajder
Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing

Co-authors

Jan Šnajder 1

Ivan Grubisic 1