Andrei Mircea


Discourse-Aware Unsupervised Summarization for Long Scientific Documents
Yue Dong | Andrei Mircea | Jackie Chi Kit Cheung
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

We propose an unsupervised graph-based ranking model for extractive summarization of long scientific documents. Our method assumes a two-level hierarchical graph representation of the source document, and exploits asymmetrical positional cues to determine sentence importance. Results on the PubMed and arXiv datasets show that our approach outperforms strong unsupervised baselines by wide margins in automatic metrics and human evaluation. In addition, it achieves performance comparable to many state-of-the-art supervised approaches which are trained on hundreds of thousands of examples. These results suggest that patterns in the discourse structure are a strong signal for determining importance in scientific articles.

Bridging the gap between supervised classification and unsupervised topic modelling for social-media assisted crisis management
Mikael Brunila | Rosie Zhao | Andrei Mircea | Sam Lumley | Renee Sieber
Proceedings of the Second Workshop on Domain Adaptation for NLP

Social media such as Twitter provide valuable information to crisis managers and affected people during natural disasters. Machine learning can help structure and extract information from the large volume of messages shared during a crisis; however, the constantly evolving nature of crises makes effective domain adaptation essential. Supervised classification is limited by unchangeable class labels that may not be relevant to new events, and unsupervised topic modelling by insufficient prior knowledge. In this paper, we bridge the gap between the two and show that BERT embeddings finetuned on crisis-related tweet classification can effectively be used to adapt to a new crisis, discovering novel topics while preserving relevant classes from supervised training, and leveraging bidirectional self-attention to extract topic keywords. We create a dataset of tweets from a snowstorm to evaluate our method’s transferability to new crises, and find that it outperforms traditional topic models in both automatic, and human evaluations grounded in the needs of crisis managers. More broadly, our method can be used for textual domain adaptation where the latent classes are unknown but overlap with known classes from other domains.


Real-time Classification, Geolocation and Interactive Visualization of COVID-19 Information Shared on Social Media to Better Understand Global Developments
Andrei Mircea
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

As people communicate on social media during COVID-19, it can be an invaluable source of useful and up-to-date information. However, the large volume and noise-to-signal ratio of social media can make this impractical. We present a prototype dashboard for the real-time classification, geolocation and interactive visualization of COVID-19 tweets that addresses these issues. We also describe a novel L2 classification layer that outperforms linear layers on a dataset of respiratory virus tweets.