Stephanie Evert
2024
Leveraging High-Precision Corpus Queries for Text Classification via Large Language Models
Nathan Dykes
|
Stephanie Evert
|
Philipp Heinrich
|
Merlin Humml
|
Lutz Schröder
Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024
We use query results from manually designed corpus queries for fine-tuning an LLM to identify argumentative fragments as a text mining task. The resulting model outperforms both an LLM fine-tuned on a relatively large manually annotated gold standard of tweets as well as a rule-based approach. This proof-of-concept study demonstrates the usefulness of corpus queries to generate training data for complex text categorisation tasks, especially if the targeted category has low prevalence (so that a manually annotated gold standard contains only a small number of positive examples).
Automatic Identification of COVID-19-Related Conspiracy Narratives in German Telegram Channels and Chats
Philipp Heinrich
|
Andreas Blombach
|
Bao Minh Doan Dang
|
Leonardo Zilio
|
Linda Havenstein
|
Nathan Dykes
|
Stephanie Evert
|
Fabian Schäfer
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
We are concerned with mapping the discursive landscape of conspiracy narratives surrounding the COVID-19 pandemic. In the present study, we analyse a corpus of more than 1,000 German Telegram posts tagged with 14 fine-grained conspiracy narrative labels by three independent annotators. Since emerging narratives on social media are short-lived and notoriously hard to track, we experiment with different state-of-the-art approaches to few-shot and zero-shot text classification. We report performance in terms of ROC-AUC and in terms of optimal F1, and compare fine-tuned methods with off-the-shelf approaches and human performance.
Search
Co-authors
- Nathan Dykes 2
- Philipp Heinrich 2
- Merlin Humml 1
- Lutz Schröder 1
- Andreas Blombach 1
- show all...