Anuj Pathania


2025

pdf bib
MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection
Yixian Shen | Qi Bi | Jia-hong Huang | Hongyi Zhu | Andy D. Pimentel | Anuj Pathania
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a new adaptation method MaCP, Minimal yet Mighty adaptive Cosine Projection, that achieves exceptional performance while requiring minimal parameters and memory for fine-tuning large foundation models.Its general idea is to exploit the superior energy compaction and decorrelation properties of cosine projection to improve both model efficiency and accuracy.Specifically, it projects the weight change from the low-rank adaptation into the discrete cosine space.Then, the weight change is partitioned over different levels of the discrete cosine spectrum, and each partition’s most critical frequency components are selected.Extensive experiments demonstrate the effectiveness of MaCP across a wide range of single-modality tasks, including natural language understanding, natural language generation, text summarization, as well as multi-modality tasks such as image classification and video understanding. MaCP consistently delivers superior accuracy, significantly reduced computational complexity, and lower memory requirements compared to existing alternatives.

pdf bib
SSH: Sparse Spectrum Adaptation via Discrete Hartley Transformation
Yixian Shen | Qi Bi | Jia-hong Huang | Hongyi Zhu | Andy D. Pimentel | Anuj Pathania
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Low-rank adaptation (LoRA) has been demonstrated effective in reducing the trainable parameter number when fine-tuning a large foundation model (LLM). However, it still encounters computational and memory challenges when scaling to larger models or addressing more complex task adaptation.In this work, we introduce **Sparse Spectrum Adaptation via Discrete Hartley Transformation (SSH)**, a novel approach that significantly reduces the number of trainable parameters while enhancing model performance. It selects the most informative spectral components across all layers, under the guidance of the initial weights after a discrete Hartley transformation (DHT). The lightweight inverse DHT then projects the spectrum back into the spatial domain for updates.Extensive experiments across both single-modality tasks—such as language understanding and generation—and multi-modality tasks—such as video-text understanding—demonstrate that SSH outperforms existing parameter-efficient fine-tuning (PEFT) methods while achieving substantial reductions in computational cost and memory requirements. For instance, during instruction tuning on the LLaMA3.1 8B model, SSH achieves higher accuracy with only 0.048M trainable parameters compared to LoRA’s 33.5M, while reducing computational intensity up to 55% compared to FourierFT.