Nathan Susanj
2026
PlanRAG-Audio: Planning and Retrieval Augmented Generation for Long-form Audio Understanding
Masao Someki | Chien-yu Huang | Siddhant Arora | Samuele Cornell | Markus Müller | Nathan Susanj | Rupak Vignesh Swaminathan | Grant Strimel | Jing Liu | Shinji Watanabe
Findings of the Association for Computational Linguistics: ACL 2026
Masao Someki | Chien-yu Huang | Siddhant Arora | Samuele Cornell | Markus Müller | Nathan Susanj | Rupak Vignesh Swaminathan | Grant Strimel | Jing Liu | Shinji Watanabe
Findings of the Association for Computational Linguistics: ACL 2026
Long-form audio understanding poses significant challenges for large audio language models (LALMs) due to the extreme length of audio sequences and the need to reason over heterogeneous acoustic cues distributed over time, such as speech content, speaker identity, emotion, and sound events. To address these challenges, we propose PlanRAG-Audio, a planning-based retrieval-augmented generation framework for scalable long-form audio understanding. Rather than having audio LALMs process entire recordings directly, PlanRAG-Audio explicitly plans which modalities and temporal spans are required for a given query, and retrieves only query-relevant information from a structured text and audio database. This retrieval planning enables effective reasoning over complex, cross-domain audio queries while substantially reducing the input length passed to the large language models. Experiments across a wide range of speech/audio retrieval demonstrate that PlanRAG-Audio improves reasoning accuracy and stabilizes performance as audio duration increases by decoupling inference cost from raw audio length.
2025
Saten: Sparse Augmented Tensor Networks for Post-Training Compression of Large Language Models
Ryan Solgi | Kai Zhen | Rupak Vignesh Swaminathan | Nathan Susanj | Athanasios Mouchtaris | Siegfried Kunzmann | Zheng Zhang
Findings of the Association for Computational Linguistics: EMNLP 2025
Ryan Solgi | Kai Zhen | Rupak Vignesh Swaminathan | Nathan Susanj | Athanasios Mouchtaris | Siegfried Kunzmann | Zheng Zhang
Findings of the Association for Computational Linguistics: EMNLP 2025
The efficient implementation of large language models (LLMs) is crucial for deployment on resource-constrained devices. Low-rank tensor compression techniques, such as tensor-train (TT) networks, have been widely studied for over-parameterized neural networks. However, their applications to compress pre-trained LLMs for downstream tasks (post-training) remains challenging due to the high-rank nature of pre-trained LLMs and the lack of access to pretraining data. In this study, we investigate low-rank tensorized LLMs during fine-tuning and propose sparse augmented tensor networks (Saten) to enhance their performance. The proposed Saten framework enables full model compression. Experimental results demonstrate that Saten enhances both accuracy and compression efficiency in tensorized language models, achieving state-of-the-art performance.
Wanda++: Pruning Large Language Models via Regional Gradients
Yifan Yang | Kai Zhen | Bhavana Ganesh | Aram Galstyan | Goeric Huybrechts | Markus Müller | Jonas M. Kübler | Rupak Vignesh Swaminathan | Athanasios Mouchtaris | Sravan Babu Bodapati | Nathan Susanj | Zheng Zhang | Jack FitzGerald | Abhishek Kumar
Findings of the Association for Computational Linguistics: ACL 2025
Yifan Yang | Kai Zhen | Bhavana Ganesh | Aram Galstyan | Goeric Huybrechts | Markus Müller | Jonas M. Kübler | Rupak Vignesh Swaminathan | Athanasios Mouchtaris | Sravan Babu Bodapati | Nathan Susanj | Zheng Zhang | Jack FitzGerald | Abhishek Kumar
Findings of the Association for Computational Linguistics: ACL 2025
Large Language Models (LLMs) pruning seeks to remove unimportant weights for inference speedup with minimal accuracy impact. However, existing methods often suffer from accuracy degradation without full-model sparsity-aware fine-tuning. This paper presents Wanda++, a novel pruning framework that outperforms the state-of-the-art methods by utilizing decoder-block-level regional gradients. Specifically, Wanda++ improves the pruning score with regional gradients for the first time and proposes an efficient regional optimization method to minimize pruning-induced output discrepancies between the dense and sparse decoder output. Notably, Wanda++ improves perplexity by up to 32% over Wanda in the language modeling task and generalizes effectively to downstream tasks. Moreover, despite updating weights with regional optimization, Wanda++ remains orthogonal to sparsity-aware fine-tuning, further reducing perplexity with LoRA in great extend. Our approach is lightweight, pruning a 7B LLaMA model in under 10 minutes on a single H100 GPU.
MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models
Zhen Zhang | Yifan Yang | Kai Zhen | Nathan Susanj | Athanasios Mouchtaris | Siegfried Kunzmann | Zheng Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Zhen Zhang | Yifan Yang | Kai Zhen | Nathan Susanj | Athanasios Mouchtaris | Siegfried Kunzmann | Zheng Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models have demonstrated exceptional capabilities across diverse tasks, but their fine-tuning demands significant memory, posing challenges for resource-constrained environments. Zeroth-order (ZO) optimization provides a memory-efficient alternative by eliminating the need for backpropagation. However, ZO optimization suffers from high gradient variance, and prior research has largely focused on single-task learning, leaving its application to multi-task learning unexplored. Multi-task learning is crucial for leveraging shared knowledge across tasks to improve generalization, yet it introduces unique challenges under ZO settings, such as amplified gradient variance and collinearity. In this paper, we present MaZO, the first framework specifically designed for multi-task LLM fine-tuning under ZO optimization. MaZO tackles these challenges at the parameter level through two key innovations: a weight importance metric to identify critical parameters and a multi-task weight update mask to selectively update these parameters, reducing the dimensionality of the parameter space and mitigating task conflicts. Experiments demonstrate that MaZO achieves state-of-the-art performance, surpassing even multi-task learning methods designed for first-order optimization.
2021
Revisiting Pretraining with Adapters
Seungwon Kim | Alex Shum | Nathan Susanj | Jonathan Hilgart
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)
Seungwon Kim | Alex Shum | Nathan Susanj | Jonathan Hilgart
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)
Pretrained language models have served as the backbone for many state-of-the-art NLP results. These models are large and expensive to train. Recent work suggests that continued pretraining on task-specific data is worth the effort as pretraining leads to improved performance on downstream tasks. We explore alternatives to full-scale task-specific pretraining of language models through the use of adapter modules, a parameter-efficient approach to transfer learning. We find that adapter-based pretraining is able to achieve comparable results to task-specific pretraining while using a fraction of the overall trainable parameters. We further explore direct use of adapters without pretraining and find that the direct fine-tuning performs mostly on par with pretrained adapter models, contradicting previously proposed benefits of continual pretraining in full pretraining fine-tuning strategies. Lastly, we perform an ablation study on task-adaptive pretraining to investigate how different hyperparameter settings can change the effectiveness of the pretraining.
Search
Fix author
Co-authors
- Athanasios Mouchtaris 3
- Rupak Vignesh Swaminathan 3
- Zheng Zhang 3
- Kai Zhen 3
- Siegfried Kunzmann 2
- Markus Müller 2
- Yifan Yang 2
- Siddhant Arora 1
- Sravan Babu Bodapati 1
- Samuele Cornell 1
- Jack Fitzgerald 1
- Aram Galstyan 1
- Bhavana Ganesh 1
- Jonathan Hilgart 1
- Chien-yu Huang 1
- Goeric Huybrechts 1
- Seungwon Kim 1
- Abhishek Kumar 1
- Jonas M. Kübler 1
- Jing Liu (刘晶, 刘璟) 1
- Alex Shum 1
- Ryan Solgi 1
- Masao Someki 1
- Grant Strimel 1
- Shinji Watanabe 1
- Zhen Zhang 1