2025
pdf
bib
abs
BnTTS: Few-Shot Speaker Adaptation in Low-Resource Setting
Mohammad Jahid Ibna Basher
|
Md Kowsher
|
Md Saiful Islam
|
Rabindra Nath Nandi
|
Nusrat Jahan Prottasha
|
Mehadi Hasan Menon
|
Tareq Al Muntasir
|
Shammur Absar Chowdhury
|
Firoj Alam
|
Niloofar Yousefi
|
Ozlem Garibay
Findings of the Association for Computational Linguistics: NAACL 2025
This paper introduces BnTTS (Bangla Text-To-Speech), the first framework for Bangla speaker adaptation-based TTS, designed to bridge the gap in Bangla speech synthesis using minimal training data. Building upon the XTTS architecture, our approach integrates Bangla into a multilingual TTS pipeline, with modifications to account for the phonetic and linguistic characteristics of the language. We pretrain BnTTS on 3.85k hours of Bangla speech dataset with corresponding text labels and evaluate performance in both zero-shot and few-shot settings on our proposed test dataset. Empirical evaluations in few-shot settings show that BnTTS significantly improves the naturalness, intelligibility, and speaker fidelity of synthesized Bangla speech. Compared to state-of-the-art Bangla TTS systems, BnTTS exhibits superior performance in Subjective Mean Opinion Score (SMOS), Naturalness, and Clarity metrics.
pdf
bib
abs
Does Self-Attention Need Separate Weights in Transformers?
Md Kowsher
|
Nusrat Jahan Prottasha
|
Chun-Nam Yu
|
Ozlem Garibay
|
Niloofar Yousefi
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)
Self-attention has revolutionized natural language processing by capturing long-range dependencies and improving context understanding. However, it comes with high computational costs and struggles with sequential data’s inherent directionality. This paper investigates and presents a simplified approach called “shared weight self-attention,” where a single weight matrix is used for Keys, Queries, and Values instead of separate matrices for each. This approach cuts training parameters by more than half and significantly reduces training time. Our method not only improves efficiency but also achieves strong performance on tasks from the GLUE benchmark, even outperforming the standard BERT baseline in handling noisy and out-of-domain data. Experimental results show a 66.53% reduction in parameter size within the attention block and competitive accuracy improvements of 3.55% and 0.89% over symmetric and pairwise attention-based BERT models, respectively.
pdf
bib
abs
LLM-Mixer: Multiscale Mixing in LLMs for Time Series Forecasting
Md Kowsher
|
Md. Shohanur Islam Sobuj
|
Nusrat Jahan Prottasha
|
E. Alejandro Alanis
|
Ozlem Garibay
|
Niloofar Yousefi
Proceedings of the 4th Table Representation Learning Workshop
Time series forecasting is a challenging task, especially when dealing with data that contains both short-term variations and long-term trends. In this study, we introduce LLM-Mixer, a novel framework that combines multiscale time-series decomposition with the power of pre-trained Large Language Models (LLMs). LLM-Mixer breaks down time-series data into multiple temporal resolutions using downsampling and processes these multiscale representations with a frozen LLM, guided by a carefully designed text prompt that encodes information about the dataset’s features and structure. To understand the role of downsampling, we conduct a detailed analysis using Neural Tangent Kernel (NTK) distance, showing that incorporating multiple scales improves the model’s learning dynamics.We evaluate LLM-Mixer across a diverse set of forecasting tasks, including long-term multivariate, short-term multivariate, and long-term univariate scenarios. Experimental results demonstrate that LLM-Mixer achieves competitive performance compared to recent state-of-the-art models across various forecasting horizons.
pdf
bib
abs
LLM-Mixer: Multiscale Mixing in LLMs for Time Series Forecasting
Md Kowsher
|
Md. Shohanur Islam Sobuj
|
Nusrat Jahan Prottasha
|
E. Alejandro Alanis
|
Ozlem Garibay
|
Niloofar Yousefi
Proceedings of the 4th Table Representation Learning Workshop
Time series forecasting is a challenging task, especially when dealing with data that contains both short-term variations and long-term trends. In this study, we introduce LLM-Mixer, a novel framework that combines multiscale time-series decomposition with the power of pre-trained Large Language Models (LLMs). LLM-Mixer breaks down time-series data into multiple temporal resolutions using downsampling and processes these multiscale representations with a frozen LLM, guided by a carefully designed text prompt that encodes information about the dataset’s features and structure. To understand the role of downsampling, we conduct a detailed analysis using Neural Tangent Kernel (NTK) distance, showing that incorporating multiple scales improves the model’s learning dynamics.We evaluate LLM-Mixer across a diverse set of forecasting tasks, including long-term multivariate, short-term multivariate, and long-term univariate scenarios. Experimental results demonstrate that LLM-Mixer achieves competitive performance compared to recent state-of-the-art models across various forecasting horizons.