Nathan Dykes


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2024

pdf bib
Leveraging High-Precision Corpus Queries for Text Classification via Large Language Models
Nathan Dykes | Stephanie Evert | Philipp Heinrich | Merlin Humml | Lutz Schröder
Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024

We use query results from manually designed corpus queries for fine-tuning an LLM to identify argumentative fragments as a text mining task. The resulting model outperforms both an LLM fine-tuned on a relatively large manually annotated gold standard of tweets as well as a rule-based approach. This proof-of-concept study demonstrates the usefulness of corpus queries to generate training data for complex text categorisation tasks, especially if the targeted category has low prevalence (so that a manually annotated gold standard contains only a small number of positive examples).

pdf bib
Automatic Identification of COVID-19-Related Conspiracy Narratives in German Telegram Channels and Chats
Philipp Heinrich | Andreas Blombach | Bao Minh Doan Dang | Leonardo Zilio | Linda Havenstein | Nathan Dykes | Stephanie Evert | Fabian Schäfer
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We are concerned with mapping the discursive landscape of conspiracy narratives surrounding the COVID-19 pandemic. In the present study, we analyse a corpus of more than 1,000 German Telegram posts tagged with 14 fine-grained conspiracy narrative labels by three independent annotators. Since emerging narratives on social media are short-lived and notoriously hard to track, we experiment with different state-of-the-art approaches to few-shot and zero-shot text classification. We report performance in terms of ROC-AUC and in terms of optimal F1, and compare fine-tuned methods with off-the-shelf approaches and human performance.

2023

pdf bib
A Pipeline for the Creation of Multimodal Corpora from YouTube Videos
Nathan Dykes | Anna Wilson | Peter Uhrig
Proceedings of the 1st Workshop on Linguistic Insights from and for Multimodal Language Processing