Extracting Topics with Simultaneous Word Co-occurrence and Semantic Correlation Graphs: Neural Topic Modeling for Short Texts

Yiming Wang; Ximing Li; Xiaotang Zhou; Jihong Ouyang

doi:10.18653/v1/2021.findings-emnlp.2

Extracting Topics with Simultaneous Word Co-occurrence and Semantic Correlation Graphs: Neural Topic Modeling for Short Texts

Yiming Wang, Ximing Li, Xiaotang Zhou, Jihong Ouyang

Abstract

Short text nowadays has become a more fashionable form of text data, e.g., Twitter posts, news titles, and product reviews. Extracting semantic topics from short texts plays a significant role in a wide spectrum of NLP applications, and neural topic modeling is now a major tool to achieve it. Motivated by learning more coherent and semantic topics, in this paper we develop a novel neural topic model named Dual Word Graph Topic Model (DWGTM), which extracts topics from simultaneous word co-occurrence and semantic correlation graphs. To be specific, we learn word features from the global word co-occurrence graph, so as to ingest rich word co-occurrence information; we then generate text features with word features, and feed them into an encoder network to get topic proportions per-text; finally, we reconstruct texts and word co-occurrence graph with topical distributions and word features, respectively. Besides, to capture semantics of words, we also apply word features to reconstruct a word semantic correlation graph computed by pre-trained word embeddings. Upon those ideas, we formulate DWGTM in an auto-encoding paradigm and efficiently train it with the spirit of neural variational inference. Empirical results validate that DWGTM can generate more semantically coherent topics than baseline topic models.

Anthology ID:: 2021.findings-emnlp.2
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2021
Month:: November
Year:: 2021
Address:: Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: Findings
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18–27
Language:
URL:: https://aclanthology.org/2021.findings-emnlp.2
DOI:: 10.18653/v1/2021.findings-emnlp.2
Bibkey:
Cite (ACL):: Yiming Wang, Ximing Li, Xiaotang Zhou, and Jihong Ouyang. 2021. Extracting Topics with Simultaneous Word Co-occurrence and Semantic Correlation Graphs: Neural Topic Modeling for Short Texts. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 18–27, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Extracting Topics with Simultaneous Word Co-occurrence and Semantic Correlation Graphs: Neural Topic Modeling for Short Texts (Wang et al., Findings 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/naacl-24-ws-corrections/2021.findings-emnlp.2.pdf
Video:: https://preview.aclanthology.org/naacl-24-ws-corrections/2021.findings-emnlp.2.mp4

PDF Cite Search Video