Wei Zhong

2026

Helios: A Foundational Language Model for Smart Energy Knowledge Reasoning and Application
Haoyu Jiang | Fanjie Zeng | Boan Qu | Xiaojie Lin | Wei Zhong
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

In the global drive toward carbon neutrality, deeply coordinated smart energy systems underpin industrial transformation, yet their interdisciplinary, fragmented, and fast-evolving expertise prevents general-purpose LLMs, lacking domain knowledge and physical-constraint awareness, from delivering precise engineering-aligned inference and generation. To address these challenges, we introduce Helios, the first large language model tailored to the smart energy domain, together with a comprehensive suite of resources to advance LLM research in this field. Specifically, we develop Enersys, a multi-agent collaborative framework for end-to-end dataset construction, through which we produce: (1) the first smart energy knowledge base, EnerBase, to enrich the model’s foundational expertise; (2) the first instruction fine-tuning dataset, EnerInsruct, to strengthen performance on domain-specific downstream tasks; and (3) the first RLHF dataset, EnerReinforce, to align the model with human preferences and industry standards. Leveraging these resources, Helios undergoes large-scale pretraining, SFT, and RLHF. We also release EnerBench, the first benchmark for evaluating LLMs in smart energy scenarios, and demonstrate that our approach significantly enhances domain knowledge mastery, task execution accuracy, and alignment with human preferences.

2022

pdf bib abs

Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval
Wei Zhong | Jheng-Hong Yang | Yuqing Xie | Jimmy Lin
Findings of the Association for Computational Linguistics: EMNLP 2022

With the recent success of dense retrieval methods based on bi-encoders, studies have applied this approach to various interesting downstream retrieval tasks with good efficiency and in-domain effectiveness.Recently, we have also seen the presence of dense retrieval models in Math Information Retrieval (MIR) tasks,but the most effective systems remain classic retrieval methods that consider hand-crafted structure features.In this work, we try to combine the best of both worlds: a well-defined structure search method for effective formula search and efficient bi-encoder dense retrieval models to capture contextual similarities.Specifically, we have evaluated two representative bi-encoder models for token-level and passage-level dense retrieval on recent MIR tasks.Our results show that bi-encoder models are highly complementary to existing structure search methods, and we are able to advance the state-of-the-art on MIR datasets.

Co-authors

Jheng-Hong Yang 1

Fanjie Zeng 1

Venues

EACL1
Findings1

Fix author