@inproceedings{olsen-plank-2021-finding,
    title = "Finding the needle in a haystack: Extraction of Informative {COVID}-19 {D}anish Tweets",
    author = "Olsen, Benjamin  and
      Plank, Barbara",
    editor = "Xu, Wei  and
      Ritter, Alan  and
      Baldwin, Tim  and
      Rahimi, Afshin",
    booktitle = "Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)",
    month = nov,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2021.wnut-1.2/",
    doi = "10.18653/v1/2021.wnut-1.2",
    pages = "11--19",
    abstract = "Finding informative COVID-19 posts in a stream of tweets is very useful to monitor health-related updates. Prior work focused on a balanced data setup and on English, but informative tweets are rare, and English is only one of the many languages spoken in the world. In this work, we introduce a new dataset of 5,000 tweets for finding informative COVID-19 tweets for Danish. In contrast to prior work, which balances the label distribution, we model the problem by keeping its natural distribution. We examine how well a simple probabilistic model and a convolutional neural network (CNN) perform on this task. We find a weighted CNN to work well but it is sensitive to embedding and hyperparameter choices. We hope the contributed dataset is a starting point for further work in this direction."
}Markdown (Informal)
[Finding the needle in a haystack: Extraction of Informative COVID-19 Danish Tweets](https://preview.aclanthology.org/ingest-emnlp/2021.wnut-1.2/) (Olsen & Plank, WNUT 2021)
ACL