Nathan Dykes

2024

pdf abs
Leveraging High-Precision Corpus Queries for Text Classification via Large Language Models
Nathan Dykes | Stephanie Evert | Philipp Heinrich | Merlin Humml | Lutz Schröder
Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024

We use query results from manually designed corpus queries for fine-tuning an LLM to identify argumentative fragments as a text mining task. The resulting model outperforms both an LLM fine-tuned on a relatively large manually annotated gold standard of tweets as well as a rule-based approach. This proof-of-concept study demonstrates the usefulness of corpus queries to generate training data for complex text categorisation tasks, especially if the targeted category has low prevalence (so that a manually annotated gold standard contains only a small number of positive examples).

pdf abs
Automatic Identification of COVID-19-Related Conspiracy Narratives in German Telegram Channels and Chats
Philipp Heinrich | Andreas Blombach | Bao Minh Doan Dang | Leonardo Zilio | Linda Havenstein | Nathan Dykes | Stephanie Evert | Fabian Schäfer
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We are concerned with mapping the discursive landscape of conspiracy narratives surrounding the COVID-19 pandemic. In the present study, we analyse a corpus of more than 1,000 German Telegram posts tagged with 14 fine-grained conspiracy narrative labels by three independent annotators. Since emerging narratives on social media are short-lived and notoriously hard to track, we experiment with different state-of-the-art approaches to few-shot and zero-shot text classification. We report performance in terms of ROC-AUC and in terms of optimal F1, and compare fine-tuned methods with off-the-shelf approaches and human performance.