STREAM-ZH: Simplified Topic Retrieval Exploration and Analysis Module for Chinese Language

Hongyi Li, Jianjun Lian, Anton Frederik Thielmann, Andre Python


Abstract
We introduce Simplified Topic Retrieval Exploration and Analysis Module for Chinese language (STREAM-ZH), the first topic modeling package to fully support the Chinese language across a broad range of topic models, evaluation metrics, and preprocessing workflows. Tailored to both simplified and traditional Chinese language, our package extends the STREAM topic modeling framework with a curated collection of preprocessed textual datasets in Chinese from which we assess the performance of classical, neural, and clustering topic models using commonly-used intruder, diversity, and coherence metrics. The results of a benchmark analysis bring evidence that within our framework, topic models may generate coherent and diverse topics from datasets in Chinese language, outperforming those generated by topic models using English-translated textual input. Our framework facilitates multilingual accessibility and research in topic modeling applied to Chinese textual data. The code is available at the following link: https://github.com/AnFreTh/STREAM
Anthology ID:
2026.eacl-short.28
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
371–383
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-short.28/
DOI:
Bibkey:
Cite (ACL):
Hongyi Li, Jianjun Lian, Anton Frederik Thielmann, and Andre Python. 2026. STREAM-ZH: Simplified Topic Retrieval Exploration and Analysis Module for Chinese Language. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), pages 371–383, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
STREAM-ZH: Simplified Topic Retrieval Exploration and Analysis Module for Chinese Language (Li et al., EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-short.28.pdf
Checklist:
 2026.eacl-short.28.checklist.pdf