Xinpeng Li
Other people with similar names: Xinpeng Li
2026
Benchmarking and Enabling Efficient Chinese Medical Retrieval via Asymmetric Encoders
Angqing Jiang | Jianlyu Chen | Zhefang | Yongcan Wang | Xinpeng Li | Keyu Ding | Defu Lian
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Angqing Jiang | Jianlyu Chen | Zhefang | Yongcan Wang | Xinpeng Li | Keyu Ding | Defu Lian
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Effective medical text retrieval requires both high accuracy and low latency. While LLM-based embedding models possess powerful retrieval capabilities, their prohibitive latency and high computational cost limit their application in real-time scenarios. Furthermore, the lack of comprehensive and high-fidelity benchmarks hinders progress in Chinese medical text retrieval. In this work, we introduce the **C**hinese **Med**ical **T**ext **E**mbedding **B**enchmark (**CMedTEB**), a benchmark spanning three kinds of practical embedding tasks: retrieval, reranking, and semantic textual similarity (STS). Distinct from purely automated datasets, CMedTEB is curated via a rigorous multi-LLM voting pipeline validated by clinical experts, ensuring gold-standard label quality while effectively mitigating annotation noise. On this foundation, we propose the **C**hinese Medical **A**symmetric **RE**triever (**CARE**), an asymmetric architecture that pairs a lightweight BERT-style encoder for online query encoding with a powerful LLM-based encoder for offline document encoding. However, optimizing such an asymmetric retriever with two structurally different encoders presents distinctive challenges. To address this, we introduce a novel two-stage training strategy that progressively bridges the query and document representations. Extensive experiments demonstrate that CARE surpasses state-of-the-art symmetric models on CMedTEB, achieving superior retrieval performance without increasing inference latency.