Yixian Shen


2025

pdf bib
MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection
Yixian Shen | Qi Bi | Jia-hong Huang | Hongyi Zhu | Andy D. Pimentel | Anuj Pathania
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a new adaptation method MaCP, Minimal yet Mighty adaptive Cosine Projection, that achieves exceptional performance while requiring minimal parameters and memory for fine-tuning large foundation models.Its general idea is to exploit the superior energy compaction and decorrelation properties of cosine projection to improve both model efficiency and accuracy.Specifically, it projects the weight change from the low-rank adaptation into the discrete cosine space.Then, the weight change is partitioned over different levels of the discrete cosine spectrum, and each partition’s most critical frequency components are selected.Extensive experiments demonstrate the effectiveness of MaCP across a wide range of single-modality tasks, including natural language understanding, natural language generation, text summarization, as well as multi-modality tasks such as image classification and video understanding. MaCP consistently delivers superior accuracy, significantly reduced computational complexity, and lower memory requirements compared to existing alternatives.

pdf bib
NeuroAda: Activating Each Neuron’s Potential for Parameter-Efficient Fine-Tuning
Zhi Zhang | Yixian Shen | Congfeng Cao | Ekaterina Shutova
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Existing parameter-efficient fine-tuning (PEFT) methods primarily fall into two categories: addition-based and selective in-situ adaptation. The former, such as LoRA, introduce additional modules to adapt the model to downstream tasks, offering strong memory efficiency. However, their representational capacity is often limited, making them less suitable for fine-grained adaptation. In contrast, the latter directly fine-tunes a carefully chosen subset of the original model parameters, allowing for more precise and effective adaptation, but at the cost of significantly increased memory consumption.To reconcile this trade-off, we propose NeuroAda, a novel PEFT method that enables fine-grained model finetuning while maintaining high memory efficiency. Our approach first identifies important parameters (i.e., connections within the network) as in selective adaptation, and then introduces bypass connections for these selected parameters. During finetuning, only the bypass connections are updated, leaving the original model parameters frozen.Empirical results on 23+ tasks spanning both natural language generation and understanding demonstrate that NeuroAda achieves state-of-the-art performance with as little as 0.02% trainable parameters, while reducing CUDA memory usage by up to 60%.We release our code here: https://github.com/FightingFighting/NeuroAda.git.

pdf bib
SurveyGen-I: Consistent Scientific Survey Generation with Evolving Plans and Memory-Guided Writing
Jing Chen | Zhiheng Yang | Yixian Shen | Jie Liu | Adam Belloum | Paola Grosso | Chrysa Papagianni
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Survey papers play a critical role in scientific communication by consolidating progress across a field. Recent advances in Large Language Models (LLMs) offer a promising solution by automating key steps in the survey-generation pipeline, such as retrieval, structuring, and summarization. However, existing LLM-based approaches often struggle with maintaining coherence across long, multi-section surveys and providing comprehensive citation coverage. To address these limitations, we introduce SurveyGen-I, an automatic survey generation framework that combines coarse-to-fine retrieval, adaptive planning, and memory-guided generation. SurveyGen-I performs survey-level retrieval to construct the initial outline and writing plan, then dynamically refines both during generation through a memory mechanism that stores previously written content and terminology, ensuring coherence across subsections. When the system detects insufficient context, it triggers fine-grained subsection-level retrieval. During generation, SurveyGen-I leverages this memory mechanism to maintain coherence across subsections. Experiments across six scientific domains demonstrate that SurveyGen-I consistently outperforms previous works in content quality, consistency, and citation coverage. The code is available at https://github.com/SurveyGens/SurveyGen-I.

pdf bib
SSH: Sparse Spectrum Adaptation via Discrete Hartley Transformation
Yixian Shen | Qi Bi | Jia-hong Huang | Hongyi Zhu | Andy D. Pimentel | Anuj Pathania
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Low-rank adaptation (LoRA) has been demonstrated effective in reducing the trainable parameter number when fine-tuning a large foundation model (LLM). However, it still encounters computational and memory challenges when scaling to larger models or addressing more complex task adaptation.In this work, we introduce **Sparse Spectrum Adaptation via Discrete Hartley Transformation (SSH)**, a novel approach that significantly reduces the number of trainable parameters while enhancing model performance. It selects the most informative spectral components across all layers, under the guidance of the initial weights after a discrete Hartley transformation (DHT). The lightweight inverse DHT then projects the spectrum back into the spatial domain for updates.Extensive experiments across both single-modality tasks—such as language understanding and generation—and multi-modality tasks—such as video-text understanding—demonstrate that SSH outperforms existing parameter-efficient fine-tuning (PEFT) methods while achieving substantial reductions in computational cost and memory requirements. For instance, during instruction tuning on the LLaMA3.1 8B model, SSH achieves higher accuracy with only 0.048M trainable parameters compared to LoRA’s 33.5M, while reducing computational intensity up to 55% compared to FourierFT.