Zeju Qiu
2026
PaperMentor: A Human-Centered Multi-Agent Writing Tutor for AI Research Papers in Overleaf
Jiarui Liu | Terry Jingchen Zhang | Ryan Faulkner | Xuanqiang Angelo Huang | Vilém Zouhar | Dominik Glandorf | Isabel Dahlgren | Rishit Dagli | Yuen Chen | Felix Leeb | Van Q. Truong | Punya Syon Pandey | Yves Bicker | Suvajit Majumder | Wenyuan Jiang | Zeju Qiu | Sankalan Pal Chowdhury | Mrinmaya Sachan | Bernhard Schölkopf | Mona T. Diab | Zhijing Jin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Jiarui Liu | Terry Jingchen Zhang | Ryan Faulkner | Xuanqiang Angelo Huang | Vilém Zouhar | Dominik Glandorf | Isabel Dahlgren | Rishit Dagli | Yuen Chen | Felix Leeb | Van Q. Truong | Punya Syon Pandey | Yves Bicker | Suvajit Majumder | Wenyuan Jiang | Zeju Qiu | Sankalan Pal Chowdhury | Mrinmaya Sachan | Bernhard Schölkopf | Mona T. Diab | Zhijing Jin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Expert writing feedback from experienced researchers is critical for early-career scholars to improve their manuscripts, yet high-quality feedback often remains scarce because reviewing research papers is labor-intensive. Emerging AI-powered writing assistants largely focus on grammar fixes or simulating peer review with final scores, yet they fall short of providing concrete, actionable suggestions that help students improve their papers during drafting. We present PaperMentor, a human-centered writing assistant system that delivers actionable suggestions as Overleaf-native inline comments while leaving the actual writing entirely to human authors. PaperMentor integrates an expert skill library carefully curated from established researchers’ writing advice with 12 specialized agents covering different aspects of paper writing, such as formatting compliance, phrasing accuracy, and terminology consistency. In a user study (n=14), 90.6% of the generated comments were rated actionable and 67.5% were rated valid, significantly outperforming a GPT-5.2 baseline without the skill library. We release PaperMentor as open source for public use.
2025
Orthogonal Finetuning Made Scalable
Zeju Qiu | Weiyang Liu | Adrian Weller | Bernhard Schölkopf
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Zeju Qiu | Weiyang Liu | Adrian Weller | Bernhard Schölkopf
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Orthogonal finetuning (OFT) offers highly parameter-efficient adaptation while preventing catastrophic forgetting, but its high runtime and memory demands limit practical deployment. We identify the core computational bottleneck in OFT as its weight-centric implementation, which relies on costly matrix-matrix multiplications with cubic complexity. To overcome this, we propose OFTv2, an input-centric reformulation that instead uses matrix-vector multiplications (i.e., matrix-free computation), reducing the computational cost to quadratic. We further introduce the Cayley–Neumann parameterization, an efficient orthogonal parameterization that approximates the matrix inversion in the Cayley transform via a truncated Neumann series. These modifications allow OFTv2 to achieve up to 10x faster training and 3x lower GPU memory usage without compromising performance. In addition, we extend OFTv2 to support finetuning quantized foundation models and show that it outperforms the popular QLoRA in training stability, efficiency, and memory usage.
2024
In Defense of Structural Sparse Adapters for Concurrent LLM Serving
Junda Su | Zirui Liu | Zeju Qiu | Weiyang Liu | Zhaozhuo Xu
Findings of the Association for Computational Linguistics: EMNLP 2024
Junda Su | Zirui Liu | Zeju Qiu | Weiyang Liu | Zhaozhuo Xu
Findings of the Association for Computational Linguistics: EMNLP 2024
Adapting large language models (LLMs) to specific tasks remains challenging due to the extensive retraining required, prompting the need for efficient adapter techniques. Despite this, the concurrent serving of multiple adapters, each with unique matrix shapes, poses significant system-level challenges. To address these issues, we identify an opportunity in structurally sparse adapters, which, unlike low-rank adapters, maintain consistent matrix shapes while varying in sparsity patterns. Leveraging this characteristic, we introduce SpartanServe, a system designed for efficient concurrent serving of LLMs using multiple structurally sparse adapters. SpartanServe employs a unified matrix multiplication operation and a novel memory management technique to enable effective batching. Furthermore, the incorporation of Triton kernels enhances the acceleration of matrix multiplication in the serving process. Experimental results demonstrate that SpartanServe achieves 2.12× speedup over S-LoRA when serving 96 adapters using a single NVIDIA A100 GPU (40GB), showcasing its efficacy in concurrent LLM serving.
Search
Fix author
Co-authors
- Weiyang Liu 2
- Bernhard Schölkopf 2
- Yves Bicker 1
- Yuen Chen 1
- Rishit Dagli 1
- Isabel Dahlgren 1
- Mona Diab 1
- Ryan Faulkner 1
- Dominik Glandorf 1
- Xuanqiang Angelo Huang 1
- Wenyuan Jiang 1
- Zhijing Jin 1
- Felix Leeb 1
- Jiarui Liu 1
- Zirui Liu 1
- Suvajit Majumder 1
- Sankalan Pal Chowdhury 1
- Punya Syon Pandey 1
- Mrinmaya Sachan 1
- Junda Su 1
- Van Q. Truong 1
- Adrian Weller 1
- Zhaozhuo Xu 1
- Terry Jingchen Zhang 1
- Vilém Zouhar 1