Yipeng Yu
2026
TaoType: Predicting Fine-Grained Typing Intent for Faster Search
Yipeng Yu | Yichen Yuan | Chengxiao Feng | Xu Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Yipeng Yu | Yichen Yuan | Chengxiao Feng | Xu Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
"Is the user’s current query input exactly what they intend to search for?" Our work aims to answer this question by determining, at each typing, whether the current query is complete. If so, a search is implicitly triggered in advance without waiting for user confirmation. This approach reduces response time and enhances the user search experience. Specifically, we propose TaoType, a client-side framework that introduces innovation in data sampling, feature selection, model design and training, and online strategy. Experiments in a leading mobile shopping application named Taobao validate its effectiveness, achieving offline precision/recall/accuracy of 0.7936/0.8196/0.7742, respectively, and decreasing online response time by 640.51±93.65 milliseconds, which is of great benefit to the search system. Unlike prior work that focuses on optimizing server-side engineering pipelines or simplifying ranking models, our method leverages client-side typing behavior for real-time early prediction, utilizing on-device computation to gain response time reducing. To the best of our knowledge, our work is the first to identify and address this problem. This work also introduces App Intelligence, a new paradigm for enhancing mobile applications by integrating on-device AI to boost business value and user experience.
2025
SEAL: Structure and Element Aware Learning Improves Long Structured Document Retrieval
Xinhao Huang | Zhibo Ren | Yipeng Yu | Ying Zhou | Zulong Chen | Zeyi Wen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Xinhao Huang | Zhibo Ren | Yipeng Yu | Ying Zhou | Zulong Chen | Zeyi Wen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
In long structured document retrieval, existing methods typically fine-tune pre-trained language models (PLMs) using contrastive learning on datasets lacking explicit structural information. This practice suffers from two critical issues: 1) current methods fail to leverage structural features and element-level semantics effectively, and 2) the lack of datasets containing structural metadata. To bridge these gaps, we propose SEAL, a novel contrastive learning framework. It leverages structure-aware learning to preserve semantic hierarchies and masked element alignment for fine-grained semantic discrimination. Furthermore, we release StructDocRetrieval, a long structured document retrieval dataset with rich structural annotations. Extensive experiments on both the released and industrial datasets across various modern PLMs, and online A/B testing demonstrate consistent improvements, boosting NDCG@10 from 73.96% to 77.84% on BGE-M3. The resources are available at https://github.com/xinhaoH/SEAL.
2020
When and Who? Conversation Transition Based on Bot-Agent Symbiosis Learning Network
Yipeng Yu | Ran Guan | Jie Ma | Zhuoxuan Jiang | Jingchang Huang
Proceedings of the 28th International Conference on Computational Linguistics
Yipeng Yu | Ran Guan | Jie Ma | Zhuoxuan Jiang | Jingchang Huang
Proceedings of the 28th International Conference on Computational Linguistics
In online customer service applications, multiple chatbots that are specialized in various topics are typically developed separately and are then merged with other human agents to a single platform, presenting to the users with a unified interface. Ideally the conversation can be transparently transferred between different sources of customer support so that domain-specific questions can be answered timely and this is what we coined as a Bot-Agent symbiosis. Conversation transition is a major challenge in such online customer service and our work formalises the challenge as two core problems, namely, when to transfer and which bot or agent to transfer to and introduces a deep neural networks based approach that addresses these problems. Inspired by the net promoter score (NPS), our research reveals how the problems can be effectively solved by providing user feedback and developing deep neural networks that predict the conversation category distribution and the NPS of the dialogues. Experiments on realistic data generated from an online service support platform demonstrate that the proposed approach outperforms state-of-the-art methods and shows promising perspective for transparent conversation transition.