Jiyuan Liu

2025

Topic modeling is a powerful unsupervised tool for knowledge discovery. However, existing work struggles with generating limited-quality topics that are uninformative and incoherent, which hindering interpretable insights from managing textual data. In this paper, we improve the original variational autoencoder framework by incorporating contextual and graph information to address the above issues. First, the encoder utilizes topic fusion techniques to combine contextual and bag-of-words information well, and meanwhile exploits the constraints of topic alignment and topic sharpening to generate informative topics. Second, we develop a simple word co-occurrence graph information fusion strategy that efficiently increases topic coherence. On three benchmark datasets, our new framework generates more coherent and diverse topics compared to various baselines, and achieves strong performance on both automatic and manual evaluations.

pdf bib abs
CARE: A Disagreement Detection Framework with Concept Alignment and Reasoning Enhancement
Jiyuan Liu | Jielin Song | Yunhe Pang | Zhiyu Shen | Yanghui Rao
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Disagreement detection is a crucial task in natural language processing (NLP), particularly in analyzing online discussions and social media content. Large language models (LLMs) have demonstrated significant advancements across various NLP tasks. However, the performance of LLM in disagreement detection is limited by two issues: *conceptual gap* and *reasoning gap*. In this paper, we propose a novel two-stage framework, Concept Alignment and Reasoning Enhancement (CARE), to tackle the issues. The first stage, Concept Alignment, addresses the gap between expert and model by performing **sub-concept taxonomy extraction**, aligning the model’s comprehension with human experts. The second stage, Reasoning Enhancement, improves the model’s reasoning capabilities by introducing curriculum learning workflow, which includes **rationale to critique** and **counterfactual to detection** for reducing spurious association. Extensive experiments on disagreement detection task demonstrate the effectiveness of our framework, showing superior performance in zero-shot and supervised learning settings, both within and across domains.

2024

pdf bib abs
Unsupervised Hierarchical Topic Modeling via Anchor Word Clustering and Path Guidance
Jiyuan Liu | Hegang Chen | Chunjiang Zhu | Yanghui Rao
Findings of the Association for Computational Linguistics: EMNLP 2024

Hierarchical topic models nowadays tend to capture the relationship between words and topics, often ignoring the role of anchor words that guide text generation. For the first time, we detect and add anchor words to the text generation process in an unsupervised way. Firstly, we adopt a clustering algorithm to adaptively detect anchor words that are highly consistent with every topic, which forms the path of topic → anchor word. Secondly, we add the causal path of anchor word → word to the popular Variational Auto-Encoder (VAE) framework via implicitly using word co-occurrence graphs. We develop the causal path of topic+anchor word → higher-layer topic that aids the expression of topic concepts with anchor words to capture a more semantically tight hierarchical topic structure. Finally, we enhance the model’s representation of the anchor words through a novel contrastive learning. After jointly training the aforementioned constraint objectives, we can produce more coherent and diverse topics with a better hierarchical structure. Extensive experiments on three datasets show that our model outperforms state-of-the-art methods.

Co-authors

Li Qing 1

Zhiyu Shen 1

Jielin Song 1

Jiaxing Yan 1

Venues

emnlp2
findings1

Fix author