Qinyu Chen
2026
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
Xin Cheng | Wangding Zeng | Damai Dai | Qinyu Chen | Bingxuan Wang | Zhenda Xie | Kezhao Huang | Xingkai Yu | Zhewen Hao | Han Zhang | Yu-Kun Li | Huishuai Zhang | Dongyan Zhao | Wenfeng Liang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xin Cheng | Wangding Zeng | Damai Dai | Qinyu Chen | Bingxuan Wang | Zhenda Xie | Kezhao Huang | Xingkai Yu | Zhewen Hao | Han Zhang | Yu-Kun Li | Huishuai Zhang | Dongyan Zhao | Wenfeng Liang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Mixture-of-Experts (MoE) scales capacity via conditional computation, but Transformers lack a native knowledge lookup primitive. We introduce conditional memory, instantiated via Deep Sparse Embedding (DSE), which indexes a massive embedding table using local n-grams for retrieval. We formalize sparsity allocation problem—how to split a fixed parameter budget between MoE experts and DSE memory—and find a U-shaped scaling law that identifies an optimal balance. Scaling to 27B parameters, DSE outperform an iso-parameter and iso-FLOPs MoE baseline across knowledge and reasoning benchmarks, and achieve markedly stronger long-context performance. Mechanistic analyses show that DSE offloads early-layer static recall into memory, freeing effective depth and attention for higher-level reasoning. DSE is also infrastructure-efficient: its deterministic hashing enables offloading massive parameters into host memory during inference with negligible throughput overhead.
2025
WIKIGENBENCH:Exploring Full-length Wikipedia Generation under Real-World Scenario
Jiebin Zhang | Eugene J. Yu | Qinyu Chen | Chenhao Xiong | Dawei Zhu | Han Qian | Mingbo Song | Weimin Xiong | Xiaoguang Li | Qun Liu | Sujian Li
Proceedings of the 31st International Conference on Computational Linguistics
Jiebin Zhang | Eugene J. Yu | Qinyu Chen | Chenhao Xiong | Dawei Zhu | Han Qian | Mingbo Song | Weimin Xiong | Xiaoguang Li | Qun Liu | Sujian Li
Proceedings of the 31st International Conference on Computational Linguistics
It presents significant challenges to generate comprehensive and accurate Wikipedia articles for newly emerging events under real-world scenario. Existing attempts fall short either by focusing only on short snippets or by using metrics that are insufficient to evaluate real-world scenarios. In this paper, we construct WIKIGENBENCH, a new benchmark consisting of 1,320 entries, designed to align with real-world scenarios in both generation and evaluation. For generation, we explore a real-world scenario where structured, full-length Wikipedia articles with citations are generated for new events using input documents from web sources. For evaluation, we integrate systematic metrics and LLM-based metrics to assess the verifiability, organization, and other aspects aligned with real-world scenarios. Based on this benchmark, we conduct extensive experiments using various models within three commonly used frameworks: direct RAG, hierarchical structure-based RAG, and RAG with fine-tuned generation model. Experimental results show that hierarchical-based methods can generate more comprehensive content, while fine-tuned methods achieve better verifiability. However, even the best methods still show a significant gap compared to existing Wikipedia content, indicating that further research is necessary.
2023
Exploring In-Context Learning for Knowledge Grounded Dialog Generation
Qinyu Chen | Wenhao Wu | Sujian Li
Findings of the Association for Computational Linguistics: EMNLP 2023
Qinyu Chen | Wenhao Wu | Sujian Li
Findings of the Association for Computational Linguistics: EMNLP 2023
Large neural-based dialog generation models have been applied in many real-life scenarios, yet they are prone to hallucination and tend to produce factually inaccurate outputs which raise great concerns. To alleviate this problem, we propose a plug-and-play retrieval-based framework IKA, which leverages in-context learning and retrieval techniques to enhance LLMs on knowledge grounded dialog generation. We design thorough experiments on a large-scale knowledge graph with 1M+ facts to investigate the effectiveness and generalization of our framework. Experiments show that our method surpasses previous training-based SOTA by a large margin, specifically 46.67% in BLEU4, 26.01% in ROUGE-L, 122.90% in BARTScore and 30.50% in Entity Coverage F1. Further analysis show promising abilities of LLMs to perform knowledge-intensive tasks, which is previously considered weak and understudied.