An Empirical Analysis of Topic Models: Uncovering the Relationships between Hyperparameters, Document Length and Performance Measures

Silvia Terragni, Elisabetta Fersini


Abstract
Neural Topic Models are recent neural models that aim at extracting the main themes from a collection of documents. The comparison of these models is usually limited because the hyperparameters are held fixed. In this paper, we present an empirical analysis and comparison of Neural Topic Models by finding the optimal hyperparameters of each model for four different performance measures adopting a single-objective Bayesian optimization. This allows us to determine the robustness of a topic model for several evaluation metrics. We also empirically show the effect of the length of the documents on different optimized metrics and discover which evaluation metrics are in conflict or agreement with each other.
Anthology ID:
2021.ranlp-1.157
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
1408–1416
Language:
URL:
https://aclanthology.org/2021.ranlp-1.157
DOI:
Bibkey:
Cite (ACL):
Silvia Terragni and Elisabetta Fersini. 2021. An Empirical Analysis of Topic Models: Uncovering the Relationships between Hyperparameters, Document Length and Performance Measures. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1408–1416, Held Online. INCOMA Ltd..
Cite (Informal):
An Empirical Analysis of Topic Models: Uncovering the Relationships between Hyperparameters, Document Length and Performance Measures (Terragni & Fersini, RANLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.ranlp-1.157.pdf