Peng Lin
2026
SPEAK: Spiking Neurons as an Entropy-Aware Tokenizer for Large Language Models
Ming Chen | Wenyao Li | Chao Liang | Shi Gu | Peng Lin | De Ma | Huajin Tang | Qian Zheng | Gang Pan
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ming Chen | Wenyao Li | Chao Liang | Shi Gu | Peng Lin | De Ma | Huajin Tang | Qian Zheng | Gang Pan
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Tokenizers play a critical role in large language model studies. Despite recent advances, existing tokenizers fail to explicitly leverage historical tokenization results when making subsequent token decisions, nor do they selectively utilize such history based on contextual relevance. We propose SPEAK, a tokenizer that integrates spiking neurons to explicitly leverage historical tokenization results. Furthermore, we introduce an entropy-aware reset mechanism that selectively leverages history based on contextual relevance, which is determined by token-level entropy. High-entropy tokens are treated as contextual boundaries, whereas low-entropy tokens between consecutive such boundaries exhibit strong contextual relevance. Accordingly, we induce hard reset at high-entropy tokens to discard irrelevant historical tokenization results, and soft reset at low-entropy tokens to preserve and leverage relevant history. Experiments on 2 language models and 5 datasets spanning 16 languages demonstrate superior cross-lingual adaptability, with competitive performance and efficiency. Our code is publicly available at https://github.com/zju-bmi-lab/SPEAK.
2023
Automatic Marketing Theme and Commodity Construction System for E-commerce
Zhiping Wang | Peng Lin | Hainan Zhang | Hongshen Chen | Tianhao Li | Zhuoye Ding | Sulong Xu | Jinghe Hu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Zhiping Wang | Peng Lin | Hainan Zhang | Hongshen Chen | Tianhao Li | Zhuoye Ding | Sulong Xu | Jinghe Hu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
When consumers’ shopping needs are concentrated, they are more interested in the collection of commodities under the specific marketing theme. Therefore, mining marketing themes and their commodities collections can help customers save shopping costs and improve user clicks and purchases for recommendation system. However, the current system invites experts to write marketing themes and select the relevant commodities, which suffer from difficulty in mass production, poor timeliness and low online indicators. Therefore, we propose a automatic marketing theme and commodity construction system, which can not only generate popular marketing themes and select the relevant commodities automatically, but also improve the theme online effectiveness in the recommendation system. Specifically, we firstly utilize the pretrained language model to generate the marketing themes. And then, we utilize the theme-commodity consistency module to select the relevant commodities for the above generative theme. What’s more, we also build the indicator simulator to evaluate the effectiveness of the above generative theme. When the indicator is lower, the above selective commodities will be input into the theme-rewriter module to generate more efficient marketing themes. Finally, we utilize the human screening to control the system quality. Both the offline experiments and online A/B test demonstrate the superior performance of our proposed system compared with state-of-the-art methods.
2022
Automatic Scene-based Topic Channel Construction System for E-Commerce
Peng Lin | Yanyan Zou | Lingfei Wu | Mian Ma | Zhuoye Ding | Bo Long
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
Peng Lin | Yanyan Zou | Lingfei Wu | Mian Ma | Zhuoye Ding | Bo Long
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
Scene marketing that well demonstrates user interests within a certain scenario has proved effective for offline shopping. To conduct scene marketing for e-commerce platforms, this work presents a novel product form, scene-based topic channel which typically consists of a list of diverse products belonging to the same usage scenario and a topic title that describes the scenario with marketing words. As manual construction of channels is time-consuming due to billions of products as well as dynamic and diverse customers’ interests, it is necessary to leverage AI techniques to automatically construct channels for certain usage scenarios and even discover novel topics. To be specific, we first frame the channel construction task as a two-step problem, i.e., scene-based topic generation and product clustering, and propose an E-commerce Scene-based Topic Channel construction system (i.e., ESTC) to achieve automated production, consisting of scene-based topic generation model for the e-commerce domain, product clustering on the basis of topic similarity, as well as quality control based on automatic model filtering and human screening. Extensive offline experiments and online A/B test validates the effectiveness of such a novel product form as well as the proposed system. In addition, we also introduce the experience of deploying the proposed system on a real-world e-commerce recommendation platform.