Wei Lin
Other people with similar names: Wei Lin
Unverified author pages with similar names: Wei Lin
2026
DiningBench: A Hierarchical Multi-view Benchmark for Perception and Reasoning in the Dietary Domain
Song Jin | Juntian Zhang | Xun Zhang | Zeying Tian | Fei Jiang | Guojun Yin | Wei Lin | Yong Liu | Rui Yan
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Song Jin | Juntian Zhang | Xun Zhang | Zeying Tian | Fei Jiang | Guojun Yin | Wei Lin | Yong Liu | Rui Yan
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent advancements in Vision-Language Models (VLMs) have revolutionized general visual understanding. However, their application in the food domain remains constrained by benchmarks that rely on coarse-grained categories, single-view imagery, and inaccurate metadata. To bridge this gap, we introduce DiningBench, a hierarchical, multi-view benchmark designed to evaluate VLMs across three levels of cognitive complexity: Fine-Grained Classification, Nutrition Estimation, and Visual Question Answering. Unlike previous datasets, DiningBench comprises 3,021 distinct dishes with an average of 5.27 images per entry, incorporating fine-grained "hard" negatives from identical menus and rigorous, verification-based nutritional data. We conduct an extensive evaluation of 29 state-of-the-art open-source and proprietary models. Our experiments reveal that while current VLMs excel at general reasoning, they struggle significantly with fine-grained visual discrimination and precise nutritional reasoning. Furthermore, we systematically investigate the impact of multi-view inputs and Chain-of-Thought reasoning, identifying five primary failure modes. DiningBench serves as a challenging testbed to drive the next generation of food-centric VLM research. All codes are released in https://github.com/meituan/DiningBench.
Privacy-Preserving Reasoning with Knowledge-Distilled Parametric Retrieval Augmented Generation
Jinwen Chen | Hainan Zhang | Liang Pang | Yongxin Tong | Haibo Zhou | Wei Lin | Zhiming Zheng
Findings of the Association for Computational Linguistics: ACL 2026
Jinwen Chen | Hainan Zhang | Liang Pang | Yongxin Tong | Haibo Zhou | Wei Lin | Zhiming Zheng
Findings of the Association for Computational Linguistics: ACL 2026
The current RAG system requires uploading plaintext documents to the cloud, risking private data leakage. Parametric RAG (PRAG) encodes documents as LoRA parameters within LLMs, offering a possible way to reduce exposure of raw content. However, it still faces two issues: (1) PRAG demands synthesizing QA pairs and fine-tuning LLM for each individual document to create its corresponding LoRA, leading to unacceptable inference latency. (2) The performance of PRAG relies solely on synthetic QA data while lacking internal alignment with standard RAG, resulting in poor generalization on out-of-distribution (OOD) inputs. Therefore, achieving high-efficiency parameterization while maintaining RAG-level performance remains a critical challenge for privacy-preserving reasoning. In this paper, we propose DistilledPRAG, a generalizable knowledge-distilled parametric RAG model aligned with standard RAG in document structure and parameter activation. We first synthesize QA pairs from single and multi-documents to enhance cross-document reasoning. Then, we mask the plaintext documents with a special token and translate them to LoRA via a parameter generator, maintaining the standard RAG document structure. Finally, guided by synthetic QA data, we train the parameter generator to match standard RAG’s hidden states and output logits, enabling RAG-style reasoning without original documents. Experiments on four QA datasets show that DistilledPRAG outperforms baselines in accuracy and generalizes well on OOD data.
Beyond Dialogue Time: Temporal Semantic Memory for Personalized LLM Agents
Miao Su | Yucan Guo | Zhongni Hou | Long Bai | Zixuan Li | Yufei Zhang | Guojun Yin | Wei Lin | Xiaolong Jin | Jiafeng Guo | Xueqi Cheng
Findings of the Association for Computational Linguistics: ACL 2026
Miao Su | Yucan Guo | Zhongni Hou | Long Bai | Zixuan Li | Yufei Zhang | Guojun Yin | Wei Lin | Xiaolong Jin | Jiafeng Guo | Xueqi Cheng
Findings of the Association for Computational Linguistics: ACL 2026
Memory enables Large Language Model (LLM) agents to perceive, store, and use information from past dialogues, which is essential for personalization. However, existing methods fail to properly model the temporal dimension of memory in two aspects: 1) Temporal inaccuracy: memories are organized by dialogue time rather than their actual occurrence time; 2) Temporal fragmentation: existing methods focus on point-wise memory, losing durative information that captures persistent states and evolving patterns. To address these limitations, we propose Temporal Semantic Memory (TSM), a memory framework that models semantic time for point-wise memory and supports the construction and utilization of durative memory. During memory construction, it first builds a semantic timeline rather than a dialogue one. Then, it consolidates temporally continuous and semantically related information into a durative memory. During memory utilization, it incorporates the query’s temporal intent on the semantic timeline, enabling the retrieval of temporally appropriate durative memories and providing time-valid, duration-consistent context to support response generation. Experiments on LongMemEval and LoCoMo show that TSM consistently outperforms existing methods and achieves up to 12.2% absolute improvement in accuracy, demonstrating the effectiveness of the proposed method.
AutoSearch: Adaptive Search Depth for Efficient Agentic RAG via Reinforcement Learning
Jingbo Sun | Wenyue Chong | Songjun Tu | Qichao Zhang | Yaocheng Zhang | Jiajun Chai | Xiaohan Wang | Wei Lin | Guojun Yin | Dongbin Zhao
Findings of the Association for Computational Linguistics: ACL 2026
Jingbo Sun | Wenyue Chong | Songjun Tu | Qichao Zhang | Yaocheng Zhang | Jiajun Chai | Xiaohan Wang | Wei Lin | Guojun Yin | Dongbin Zhao
Findings of the Association for Computational Linguistics: ACL 2026
Agentic retrieval-augmented generation (RAG) systems enable large language models (LLMs) to solve complex tasks through multi-step interaction with external retrieval tools. However, such multi-step interaction often involves redundant search steps, incurring substantial computational cost and latency. Prior work limits search depth (i.e., the number of search steps) to reduce cost, but this often leads to underexploration of complex questions. To address this, we first investigate how search depth affects accuracy and find a minimal sufficient search depth that defines an accuracy-efficiency trade-off, jointly determined by question complexity and the agent’s capability. Furthermore, we propose AutoSearch, a reinforcement learning framework that evaluates each search step via self-generated intermediate answers. By a self-answering mechanism, AutoSearch identifies the minimal sufficient search depth and promotes efficient search by rewarding its attainment while penalizing over-searching. In addition, reward mechanisms are introduced to stabilize search behavior and improve answer quality on complex questions. Extensive experiments on multiple benchmarks show that AutoSearch achieves a superior accuracy-efficiency trade-off, alleviating over-searching while preserving search quality.
Mem2Evolve: Towards Self-Evolving Agents via Co-Evolutionary Capability Expansion and Experience Distillation
Zihao Cheng | Zeming Liu | Yingyu Shan | Xinyi Wang | Xiangrong Zhu | Yunpu Ma | Hongru Wang | Yuhang Guo | Wei Lin | Yunhong Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zihao Cheng | Zeming Liu | Yingyu Shan | Xinyi Wang | Xiangrong Zhu | Yunpu Ma | Hongru Wang | Yuhang Guo | Wei Lin | Yunhong Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While large language model–powered agents can self-evolve by accumulating experience or by dynamically creating new assets (i.e., tools or expert agents), existing frameworks typically treat these two evolutionary processes in isolation. This separation overlooks their intrinsic interdependence: the former is inherently bounded by a manually predefined static toolset, while the latter generates new assets from scratch without experiential guidance, leading to limited capability growth and unstable evolution. To address this limitation, we introduce a novel paradigm of co-evolutionary Capability Expansion and Experience Distillation. Guided by this paradigm, we propose the **Mem2Evolve**, which integrates two core components: **Experience Memory** and **Asset Memory**. Specifically, Mem2Evolve leverages accumulated experience to guide the dynamic creation of assets, thereby expanding the agent’s capability space while simultaneously acquiring new experience to achieve co-evolution. Extensive experiments across 6 task categories and 8 benchmarks demonstrate that Mem2Evolve achieves improvement of 18.53% over standard LLMs, 11.80% over agents evolving solely through experience, and 6.46% over those evolving solely through asset creation, establishing it as a substantially more effective and stable self-evolving agent framework.
2025
Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender Systems
Song Jin | Juntian Zhang | Yuhan Liu | Xun Zhang | Yufei Zhang | Guojun Yin | Fei Jiang | Wei Lin | Rui Yan
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Song Jin | Juntian Zhang | Yuhan Liu | Xun Zhang | Yufei Zhang | Guojun Yin | Fei Jiang | Wei Lin | Rui Yan
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Evaluating and iterating upon recommender systems is crucial, yet traditional A/B testing is resource-intensive, and offline methods struggle with dynamic user-platform interactions. While agent-based simulation is promising, existing platforms often lack a mechanism for user actions to dynamically reshape the environment. To bridge this gap, we introduce RecInter , a novel agent-based simulation platform for recommender systems featuring a robust interaction mechanism. In RecInter platform, simulated user actions (e.g., likes, reviews, purchases) dynamically update item attributes in real-time, and introduced Merchant Agents can reply, fostering a more realistic and evolving ecosystem. High-fidelity simulation is ensured through Multidimensional User Profiling module, Advanced Agent Architecture, and LLM fine-tuned on Chain-of-Thought (CoT) enriched interaction data. Our platform achieves significantly improved simulation credibility and successfully replicates emergent phenomena like Brand Loyalty and the Matthew Effect. Experiments demonstrate that this interaction mechanism is pivotal for simulating realistic system evolution, establishing our platform as a credible testbed for recommender systems research. All codes are released in https://github.com/jinsong8/RecInter.
RLAE: Reinforcement Learning-Assisted Ensemble for LLMs
Yuqian Fu | Yuanheng Zhu | Jiajun Chai | Guojun Yin | Wei Lin | Qichao Zhang | Dongbin Zhao
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yuqian Fu | Yuanheng Zhu | Jiajun Chai | Guojun Yin | Wei Lin | Qichao Zhang | Dongbin Zhao
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Ensembling large language models (LLMs) can effectively combine diverse strengths of different models, offering a promising approach to enhance performance across various tasks. However, existing methods typically rely on fixed weighting strategies that fail to adapt to the dynamic, context-dependent characteristics of LLM capabilities. In this work, we propose **R**einforcement **L**earning-**A**ssisted **E**nsemble for LLMs (RLAE), a novel framework that reformulates LLM ensemble through the lens of a Markov Decision Process (MDP). Our approach introduces a RL agent that dynamically adjusts ensemble weights by considering both input context and intermediate generation states, with the agent being trained using rewards that directly correspond to the quality of final outputs. We implement RLAE using both single-agent and multi-agent reinforcement learning algorithms (RLAE_PPO and RLAE_MAPPO ), demonstrating substantial improvements over conventional ensemble methods. Extensive evaluations on a diverse set of tasks show that RLAE outperforms existing approaches by up to 3.3\\% accuracy points, offering a more effective framework for LLM ensembling. Furthermore, our method exhibits superior generalization capabilities across different tasks without the need for retraining, while simultaneously achieving lower time latency. The source code is available at here.
UIOrchestra: Generating High-Fidelity Code from UI Designs with a Multi-agent System
Chuhuai Yue | Jiajun Chai | Yufei Zhang | Zixiang Ding | Xihao Liang | Peixin Wang | Shihai Chen | Wang Yixuan | Wangyanping | Guojun Yin | Wei Lin
Findings of the Association for Computational Linguistics: EMNLP 2025
Chuhuai Yue | Jiajun Chai | Yufei Zhang | Zixiang Ding | Xihao Liang | Peixin Wang | Shihai Chen | Wang Yixuan | Wangyanping | Guojun Yin | Wei Lin
Findings of the Association for Computational Linguistics: EMNLP 2025
Recent advances in large language models (LLMs) have significantly improved automated code generation, enabling tools such as GitHub Copilot and CodeWhisperer to assist developers in a wide range of programming tasks. However, the translation of complex mobile UI designs into high-fidelity front-end code remains a challenging and underexplored area, especially as modern app interfaces become increasingly intricate. In this work, we propose UIOrchestra, a collaborative multi-agent system designed for the AppUI2Code task, which aims to reconstruct static single-page applications from design mockups. UIOrchestra integrates three specialized agents, layout description, code generation, and difference analysis agent that work collaboratively to address the limitations of single-model approaches. To facilitate robust evaluation, we introduce APPUI, the first benchmark dataset for AppUI2Code, constructed through a human-in-the-loop process to ensure data quality and coverage. Experimental results demonstrate that UIOrchestra outperforms existing methods in reconstructing complex app pages and highlight the necessity of multi-agent collaboration for this task. We hope our work will inspire further research on leveraging LLMs for front-end automation. The code and data will be released upon paper acceptance.
Search
Fix author
Co-authors
- Guojun Yin 6
- Jiajun Chai 3
- Yufei Zhang 3
- Fei Jiang 2
- Song Jin 2
- Rui Yan 2
- Juntian Zhang 2
- Xun Zhang 2
- Qichao Zhang 2
- Dongbin Zhao 2
- Long Bai 1
- Jinwen Chen 1
- Shihai Chen 1
- Xueqi Cheng (程学旗) 1
- Zihao Cheng 1
- Wenyue Chong 1
- Zixiang Ding 1
- Yuqian Fu 1
- Yucan Guo 1
- Jiafeng Guo (嘉丰 郭) 1
- Yuhang Guo (郭宇航) 1
- Zhongni Hou 1
- Xiaolong Jin 1
- Zixuan Li 1
- Xihao Liang 1
- Yuhan Liu 1
- Yong Liu 1
- Zeming Liu 1
- Yunpu Ma 1
- Liang Pang (庞亮) 1
- Yingyu Shan 1
- Miao Su 1
- Jingbo Sun 1
- Zeying Tian 1
- Yongxin Tong 1
- Songjun Tu 1
- Peixin Wang 1
- Xiaohan Wang 1
- Xinyi Wang 1
- Hongru Wang 1
- Yunhong Wang 1
- Wangyanping 1
- Wang Yixuan 1
- Chuhuai Yue 1
- Hainan Zhang 1
- Yaocheng Zhang 1
- Zhiming Zheng 1
- Haibo Zhou 1
- Yuanheng Zhu 1
- Xiangrong Zhu 1