Xuye
2026
From Short Video to Clickable Search: RLVR-Enabled Listwise Query Suggestion with Retrieval-Augmented Context
Mingkai Tian | Xuye | Long Meng | Liwei Chen | Zhiheng Qin | Yi Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Mingkai Tian | Xuye | Long Meng | Liwei Chen | Zhiheng Qin | Yi Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Short-video platforms now present tappable search entries beneath the video player, making it effortless for users to shift from passively watching to actively searching for information. Prior work on bottom-bar query generation conditions on titles and OCR to generate a single query per forward pass, constrains decoding with a trie, and evaluates against a single reference using edit-distance–style supervision—making it difficult to cover the diverse intents a video can trigger and to credit semantically equivalent query variants. Motivated by these limitations, we propose four complementary improvements. First, we reformulate the task as one-shot list generation, producing multiple distinct queries per video, and build multi-query ground truth from exposure and CTR logs. Second, we redesign offline evaluation with \operatorname{CTR\text{-}HungF1}, a CTR-weighted set-matching metric via optimal assignment over token-level F1 score. Third, we enrich context with a video-to-video-to-query (V2V2Q) RAG pipeline to provide behavior-grounded background knowledge. Finally, we apply thinking-free RLVR with deterministic format checks and \operatorname{CTR\text{-}HungF1} rewards to train a compact LLM without reward models or CoT distillation. The resulting system yields strong offline and online improvements, and has been deployed on Kuaishou to serve hundreds of millions of users daily.