Kunal Verma
2021
Mega-COV: A Billion-Scale Dataset of 100+ Languages for COVID-19
Muhammad Abdul-Mageed
|
AbdelRahim Elmadany
|
El Moatez Billah Nagoudi
|
Dinesh Pabbi
|
Kunal Verma
|
Rannie Lin
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
We describe Mega-COV, a billion-scale dataset from Twitter for studying COVID-19. The dataset is diverse (covers 268 countries), longitudinal (goes as back as 2007), multilingual (comes in 100+ languages), and has a significant number of location-tagged tweets (~169M tweets). We release tweet IDs from the dataset. We also develop two powerful models, one for identifying whether or not a tweet is related to the pandemic (best F1=97%) and another for detecting misinformation about COVID-19 (best F1=92%). A human annotation study reveals the utility of our models on a subset of Mega-COV. Our data and models can be useful for studying a wide host of phenomena related to the pandemic. Mega-COV and our models are publicly available.
2012
Building a Lightweight Semantic Model for Unsupervised Information Extraction on Short Listings
Doo Soon Kim
|
Kunal Verma
|
Peter Yeh
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Search