Extracting Topics with Simultaneous Word Co-occurrence and Semantic Correlation Graphs: Neural Topic Modeling for Short Texts

Yiming Wang, Ximing Li, Xiaotang Zhou, Jihong Ouyang


Abstract
Short text nowadays has become a more fashionable form of text data, e.g., Twitter posts, news titles, and product reviews. Extracting semantic topics from short texts plays a significant role in a wide spectrum of NLP applications, and neural topic modeling is now a major tool to achieve it. Motivated by learning more coherent and semantic topics, in this paper we develop a novel neural topic model named Dual Word Graph Topic Model (DWGTM), which extracts topics from simultaneous word co-occurrence and semantic correlation graphs. To be specific, we learn word features from the global word co-occurrence graph, so as to ingest rich word co-occurrence information; we then generate text features with word features, and feed them into an encoder network to get topic proportions per-text; finally, we reconstruct texts and word co-occurrence graph with topical distributions and word features, respectively. Besides, to capture semantics of words, we also apply word features to reconstruct a word semantic correlation graph computed by pre-trained word embeddings. Upon those ideas, we formulate DWGTM in an auto-encoding paradigm and efficiently train it with the spirit of neural variational inference. Empirical results validate that DWGTM can generate more semantically coherent topics than baseline topic models.
Anthology ID:
2021.findings-emnlp.2
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
18–27
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.2
DOI:
10.18653/v1/2021.findings-emnlp.2
Bibkey:
Cite (ACL):
Yiming Wang, Ximing Li, Xiaotang Zhou, and Jihong Ouyang. 2021. Extracting Topics with Simultaneous Word Co-occurrence and Semantic Correlation Graphs: Neural Topic Modeling for Short Texts. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 18–27, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Extracting Topics with Simultaneous Word Co-occurrence and Semantic Correlation Graphs: Neural Topic Modeling for Short Texts (Wang et al., Findings 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.findings-emnlp.2.pdf
Video:
 https://preview.aclanthology.org/ingestion-script-update/2021.findings-emnlp.2.mp4