Alfonso Martinez Arranz


2021

pdf
Principled Analysis of Energy Discourse across Domains with Thesaurus-based Automatic Topic Labeling
Thomas Scelsi | Alfonso Martinez Arranz | Lea Frermann
Proceedings of the The 19th Annual Workshop of the Australasian Language Technology Association

With the increasing impact of Natural Language Processing tools like topic models in social science research, the experimental rigor and comparability of models and datasets has come under scrutiny. Especially when contributing to research on topics with worldwide impacts like energy policy, objective analyses and reliable datasets are necessary. We contribute toward this goal in two ways: first, we release two diachronic corpora covering 23 years of energy discussions in the U.S. Energy Information Administration. Secondly, we propose a simple and theoretically sound method for automatic topic labelling drawing on political thesauri. We empirically evaluate the quality of our labels, and apply our labelling to topics induced by diachronic topic models on our energy corpora, and present a detailed analysis.