Seoha Song
2025
zFLoRA: Zero-Latency Fused Low-Rank Adapters
Dhananjaya Gowda
|
Seoha Song
|
Harshith Goka
|
Junhyun Lee
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) are increasingly deployed with task-specific adapters catering to multiple downstream applications. In such a scenario, the additional compute associated with these apparently insignificant number of adapter parameters (typically less than 1% of the base model) turns out to be disproportionately significant during inference time (up to 2.5x times that of the base model). In this paper, we propose a new zero-latency fused low-rank adapter (zFLoRA) that introduces zero or negligible latency overhead on top of the base model. Experimental results on LLMs of size 1B, 3B and 7B show that zFLoRA compares favorably against the popular supervised fine-tuning benchmarks including low-rank adapters (LoRA) as well as full fine-tuning (FFT). Experiments are conducted on 18 different tasks across three different categories namely commonsense reasoning, math reasoning and summary-dialogue. Latency measurements made on NPU (Samsung Galaxy S25+) as well as GPU (NVIDIA H100) platforms show that the proposed zFLoRA adapters introduce zero to negligible latency overhead.
2020
Contextual Augmentation of Pretrained Language Models for Emotion Recognition in Conversations
Jonggu Kim
|
Hyeonmok Ko
|
Seoha Song
|
Saebom Jang
|
Jiyeon Hong
Proceedings of the Third Workshop on Computational Modeling of People's Opinions, Personality, and Emotion's in Social Media
Since language model pretraining to learn contextualized word representations has been proposed, pretrained language models have made success in many natural language processing tasks. That is because it is helpful to use individual contextualized representations of self-attention layers as to initialize parameters for downstream tasks. Yet, unfortunately, use of pretrained language models for emotion recognition in conversations has not been studied enough. We firstly use ELECTRA which is a state-of-the-art pretrained language model and validate the performance on emotion recognition in conversations. Furthermore, we propose contextual augmentation of pretrained language models for emotion recognition in conversations, which is to consider not only previous utterances, but also conversation-related information such as speakers, speech acts and topics. We classify information based on what the information is related to, and propose position of words corresponding to the information in the entire input sequence. To validate the proposed method, we conduct experiments on the DailyDialog dataset which contains abundant annotated information of conversations. The experiments show that the proposed method achieves state-of-the-art F1 scores on the dataset and significantly improves the performance.
Search
Fix author
Co-authors
- Harshith Goka 1
- Dhananjaya Gowda 1
- Jiyeon Hong 1
- Saebom Jang 1
- Jonggu Kim 1
- show all...