Jakob Berndt


2022

pdf
Incorporating Stock Market Signals for Twitter Stance Detection
Costanza Conforti | Jakob Berndt | Mohammad Taher Pilehvar | Chryssi Giannitsarou | Flavio Toxvaerd | Nigel Collier
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Research in stance detection has so far focused on models which leverage purely textual input. In this paper, we investigate the integration of textual and financial signals for stance detection in the financial domain. Specifically, we propose a robust multi-task neural architecture that combines textual input with high-frequency intra-day time series from stock market prices. Moreover, we extend wt–wt, an existing stance detection dataset which collects tweets discussing Mergers and Acquisitions operations, with the relevant financial signal. Importantly, the obtained dataset aligns with Stander, an existing news stance detection dataset, thus resulting in a unique multimodal, multi-genre stance detection resource. We show experimentally and through detailed result analysis that our stance detection system benefits from financial information, and achieves state-of-the-art results on the wt–wt dataset: this demonstrates that the combination of multiple input signals is effective for cross-target stance detection, and opens interesting research directions for future work.

2021

pdf bib
Adversarial Training for News Stance Detection: Leveraging Signals from a Multi-Genre Corpus.
Costanza Conforti | Jakob Berndt | Marco Basaldella | Mohammad Taher Pilehvar | Chryssi Giannitsarou | Flavio Toxvaerd | Nigel Collier
Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

Cross-target generalization constitutes an important issue for news Stance Detection (SD). In this short paper, we investigate adversarial cross-genre SD, where knowledge from annotated user-generated data is leveraged to improve news SD on targets unseen during training. We implement a BERT-based adversarial network and show experimental performance improvements over a set of strong baselines. Given the abundance of user-generated data, which are considerably less expensive to retrieve and annotate than news articles, this constitutes a promising research direction.

pdf
Synthetic Examples Improve Cross-Target Generalization: A Study on Stance Detection on a Twitter corpus.
Costanza Conforti | Jakob Berndt | Mohammad Taher Pilehvar | Chryssi Giannitsarou | Flavio Toxvaerd | Nigel Collier
Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Cross-target generalization is a known problem in stance detection (SD), where systems tend to perform poorly when exposed to targets unseen during training. Given that data annotation is expensive and time-consuming, finding ways to leverage abundant unlabeled in-domain data can offer great benefits. In this paper, we apply a weakly supervised framework to enhance cross-target generalization through synthetically annotated data. We focus on Twitter SD and show experimentally that integrating synthetic data is helpful for cross-target generalization, leading to significant improvements in performance, with gains in F1 scores ranging from +3.4 to +5.1.

2020

pdf
STANDER: An Expert-Annotated Dataset for News Stance Detection and Evidence Retrieval
Costanza Conforti | Jakob Berndt | Mohammad Taher Pilehvar | Chryssi Giannitsarou | Flavio Toxvaerd | Nigel Collier
Findings of the Association for Computational Linguistics: EMNLP 2020

We present a new challenging news dataset that targets both stance detection (SD) and fine-grained evidence retrieval (ER). With its 3,291 expert-annotated articles, the dataset constitutes a high-quality benchmark for future research in SD and multi-task learning. We provide a detailed description of the corpus collection methodology and carry out an extensive analysis on the sources of disagreement between annotators, observing a correlation between their disagreement and the diffusion of uncertainty around a target in the real world. Our experiments show that the dataset poses a strong challenge to recent state-of-the-art models. Notably, our dataset aligns with an existing Twitter SD dataset: their union thus addresses a key shortcoming of previous works, by providing the first dedicated resource to study multi-genre SD as well as the interplay of signals from social media and news sources in rumour verification.

pdf
Will-They-Won’t-They: A Very Large Dataset for Stance Detection on Twitter
Costanza Conforti | Jakob Berndt | Mohammad Taher Pilehvar | Chryssi Giannitsarou | Flavio Toxvaerd | Nigel Collier
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We present a new challenging stance detection dataset, called Will-They-Won’t-They (WT--WT), which contains 51,284 tweets in English, making it by far the largest available dataset of the type. All the annotations are carried out by experts; therefore, the dataset constitutes a high-quality and reliable benchmark for future research in stance detection. Our experiments with a wide range of recent state-of-the-art stance detection systems show that the dataset poses a strong challenge to existing models in this domain.