STREAM-ZH: Simplified Topic Retrieval Exploration and Analysis Module for Chinese Language
Hongyi Li, Jianjun Lian, Anton Frederik Thielmann, Andre Python
Abstract
We introduce Simplified Topic Retrieval Exploration and Analysis Module for Chinese language (STREAM-ZH), the first topic modeling package to fully support the Chinese language across a broad range of topic models, evaluation metrics, and preprocessing workflows. Tailored to both simplified and traditional Chinese language, our package extends the STREAM topic modeling framework with a curated collection of preprocessed textual datasets in Chinese from which we assess the performance of classical, neural, and clustering topic models using commonly-used intruder, diversity, and coherence metrics. The results of a benchmark analysis bring evidence that within our framework, topic models may generate coherent and diverse topics from datasets in Chinese language, outperforming those generated by topic models using English-translated textual input. Our framework facilitates multilingual accessibility and research in topic modeling applied to Chinese textual data. The code is available at the following link: https://github.com/AnFreTh/STREAM- Anthology ID:
- 2026.eacl-short.28
- Volume:
- Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Editors:
- Vera Demberg, Kentaro Inui, Lluís Marquez
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 371–383
- Language:
- URL:
- https://preview.aclanthology.org/ingest-eacl/2026.eacl-short.28/
- DOI:
- Cite (ACL):
- Hongyi Li, Jianjun Lian, Anton Frederik Thielmann, and Andre Python. 2026. STREAM-ZH: Simplified Topic Retrieval Exploration and Analysis Module for Chinese Language. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), pages 371–383, Rabat, Morocco. Association for Computational Linguistics.
- Cite (Informal):
- STREAM-ZH: Simplified Topic Retrieval Exploration and Analysis Module for Chinese Language (Li et al., EACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-eacl/2026.eacl-short.28.pdf