Shusen Liu


2024

pdf
RankMean: Module-Level Importance Score for Merging Fine-tuned LLM Models
Gabriel Perin | Xuxi Chen | Shusen Liu | Bhavya Kailkhura | Zhangyang Wang | Brian Gallagher
Findings of the Association for Computational Linguistics ACL 2024

Traditionally, developing new language models (LMs) capable of addressing multiple tasks involves fine-tuning pre-trained LMs using a wide collection of datasets, a process that often incurs significant computational expenses. Model merging emerges as a cost-effective alternative, allowing the integration of existing models fine-tuned on different tasks into a single model that performs well across all tasks, eliminating the need for additional training. In this paper, we propose RankMean, an algorithm for merging fine-tuned LMs without requiring any downstream data. RankMean determines merging coefficients based on the relative rankings of weight change magnitudes and applies these coefficients for module-wise integration of various fine-tuned models. Our experimental results demonstrate that RankMean outperforms existing baseline methods on multiple benchmarks. The code is available at https://github.com/VITA-Group/RankMean.

2018

pdf
Visual Interrogation of Attention-Based Models for Natural Language Inference and Machine Comprehension
Shusen Liu | Tao Li | Zhimin Li | Vivek Srikumar | Valerio Pascucci | Peer-Timo Bremer
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Neural networks models have gained unprecedented popularity in natural language processing due to their state-of-the-art performance and the flexible end-to-end training scheme. Despite their advantages, the lack of interpretability hinders the deployment and refinement of the models. In this work, we present a flexible visualization library for creating customized visual analytic environments, in which the user can investigate and interrogate the relationships among the input, the model internals (i.e., attention), and the output predictions, which in turn shed light on the model decision-making process.