Roman Kern

2022

pdf abs
Causal Investigation of Public Opinion during the COVID-19 Pandemic via Social Media Text
Michael Jantscher | Roman Kern
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Understanding the needs and fears of citizens, especially during a pandemic such as COVID-19, is essential for any government or legislative entity. An effective COVID-19 strategy further requires that the public understand and accept the restriction plans imposed by these entities. In this paper, we explore a causal mediation scenario in which we want to emphasize the use of NLP methods in combination with methods from economics and social sciences. Based on sentiment analysis of Tweets towards the current COVID-19 situation in the UK and Sweden, we conduct several causal inference experiments and attempt to decouple the effect of government restrictions on mobility behavior from the effect that occurs due to public perception of the COVID-19 strategy in a country. To avoid biased results we control for valid country specific epidemiological and time-varying confounders. Comprehensive experiments show that not all changes in mobility are caused by countries implemented policies but also by the support of individuals in the fight against this pandemic. We find that social media texts are an important source to capture citizens’ concerns and trust in policy makers and are suitable to evaluate the success of government policies.

pdf abs
Impact of Training Instance Selection on Domain-Specific Entity Extraction using BERT
Eileen Salhofer | Xing Lan Liu | Roman Kern
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop

State of the art performances for entity extraction tasks are achieved by supervised learning, specifically, by fine-tuning pretrained language models such as BERT. As a result, annotating application specific data is the first step in many use cases. However, no practical guidelines are available for annotation requirements. This work supports practitioners by empirically answering the frequently asked questions (1) how many training samples to annotate? (2) which examples to annotate? We found that BERT achieves up to 80% F1 when fine-tuned on only 70 training examples, especially on biomedical domain. The key features for guiding the selection of high performing training instances are identified to be pseudo-perplexity and sentence-length. The best training dataset constructed using our proposed selection strategy shows F1 score that is equivalent to a random selection with twice the sample size. The requirement of only a small number of training data implies cheaper implementations and opens door to wider range of applications.

2019

pdf abs
Know-Center at SemEval-2019 Task 5: Multilingual Hate Speech Detection on Twitter using CNNs
Kevin Winter | Roman Kern
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper presents the Know-Center system submitted for task 5 of the SemEval-2019 workshop. Given a Twitter message in either English or Spanish, the task is to first detect whether it contains hateful speech and second, to determine the target and level of aggression used. For this purpose our system utilizes word embeddings and a neural network architecture, consisting of both dilated and traditional convolution layers. We achieved average F1-scores of 0.57 and 0.74 for English and Spanish respectively.

2017

pdf abs
Know-Center at SemEval-2017 Task 10: Sequence Classification with the CODE Annotator
Roman Kern | Stefan Falk | Andi Rexha
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes our participation in SemEval-2017 Task 10. We competed in Subtask 1 and 2 which consist respectively in identifying all the key phrases in scientific publications and label them with one of the three categories: Task, Process, and Material. These scientific publications are selected from Computer Science, Material Sciences, and Physics domains. We followed a supervised approach for both subtasks by using a sequential classifier (CRF - Conditional Random Fields). For generating our solution we used a web-based application implemented in the EU-funded research project, named CODE. Our system achieved an F1 score of 0.39 for the Subtask 1 and 0.28 for the Subtask 2.

Roman Kern

2022

2019

2017

2016

2014

2013

2010

Co-authors

Venues