Principled Analysis of Energy Discourse across Domains with Thesaurus-based Automatic Topic Labeling
Abstract
With the increasing impact of Natural Language Processing tools like topic models in social science research, the experimental rigor and comparability of models and datasets has come under scrutiny. Especially when contributing to research on topics with worldwide impacts like energy policy, objective analyses and reliable datasets are necessary. We contribute toward this goal in two ways: first, we release two diachronic corpora covering 23 years of energy discussions in the U.S. Energy Information Administration. Secondly, we propose a simple and theoretically sound method for automatic topic labelling drawing on political thesauri. We empirically evaluate the quality of our labels, and apply our labelling to topics induced by diachronic topic models on our energy corpora, and present a detailed analysis.- Anthology ID:
- 2021.alta-1.11
- Volume:
- Proceedings of the The 19th Annual Workshop of the Australasian Language Technology Association
- Month:
- December
- Year:
- 2021
- Address:
- Online
- Venue:
- ALTA
- SIG:
- Publisher:
- Australasian Language Technology Association
- Note:
- Pages:
- 107–118
- Language:
- URL:
- https://aclanthology.org/2021.alta-1.11
- DOI:
- Cite (ACL):
- Thomas Scelsi, Alfonso Martinez Arranz, and Lea Frermann. 2021. Principled Analysis of Energy Discourse across Domains with Thesaurus-based Automatic Topic Labeling. In Proceedings of the The 19th Annual Workshop of the Australasian Language Technology Association, pages 107–118, Online. Australasian Language Technology Association.
- Cite (Informal):
- Principled Analysis of Energy Discourse across Domains with Thesaurus-based Automatic Topic Labeling (Scelsi et al., ALTA 2021)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2021.alta-1.11.pdf
- Code
- tscelsi/dtm-toolkit + additional community code