SLlama: Parameter-Efficient Language Model Architecture for Enhanced Linguistic Competence Under Strict Data Constraints

Victor Adelakun Omolaoye, Babajide Alamu Owoyele, Gerard de Melo


Abstract
Scaling data and model size has driven recent advances in language modeling, but this strategy falters under scenarios with strict data constraints, as in the BabyLM Challenge. However, insights from Chinchilla highlights that smaller models trained on more data outperform larger counterparts trained inadequately, emphasizing the need for compact architectures. Furthermore, while embedding weight tying is a common parameter-saving technique, we find it significantly diminishes linguistic competence in compact models.In response, we explore alternative architectural strategies that preserve the parameter efficiency of tied models without sacrificing the representational benefits of untied embeddings. Consequently, we introduce SLlama a Llama3 architecture variant which incorporates targeted modifications—Repeated Reduced Hidden Size and Projection (RRHP), Permutated Weight Attention (PWA), Shared Projection Multi-Layer Perceptron (SPMLP), and Layer Weight Sharing—to compress Transformer components. Without relying on distillation, SLlama achieves a 31.72% improvement in linguistic knowledge acquisition over the BabyLlama baseline, with a comparable GLUE score and significantly lower parameter count. These results demonstrate that well-designed, compact models can rival larger ones under strict data constraints.
Anthology ID:
2025.emnlp-main.1198
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
23491–23506
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1198/
DOI:
Bibkey:
Cite (ACL):
Victor Adelakun Omolaoye, Babajide Alamu Owoyele, and Gerard de Melo. 2025. SLlama: Parameter-Efficient Language Model Architecture for Enhanced Linguistic Competence Under Strict Data Constraints. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 23491–23506, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
SLlama: Parameter-Efficient Language Model Architecture for Enhanced Linguistic Competence Under Strict Data Constraints (Omolaoye et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1198.pdf
Checklist:
 2025.emnlp-main.1198.checklist.pdf