SLlama: Parameter-Efficient Language Model Architecture for Enhanced Linguistic Competence Under Strict Data Constraints
Victor Adelakun Omolaoye, Babajide Alamu Owoyele, Gerard de Melo
Abstract
Scaling data and model size has driven recent advances in language modeling, but this strategy falters under scenarios with strict data constraints, as in the BabyLM Challenge. However, insights from Chinchilla highlights that smaller models trained on more data outperform larger counterparts trained inadequately, emphasizing the need for compact architectures. Furthermore, while embedding weight tying is a common parameter-saving technique, we find it significantly diminishes linguistic competence in compact models.In response, we explore alternative architectural strategies that preserve the parameter efficiency of tied models without sacrificing the representational benefits of untied embeddings. Consequently, we introduce SLlama a Llama3 architecture variant which incorporates targeted modifications—Repeated Reduced Hidden Size and Projection (RRHP), Permutated Weight Attention (PWA), Shared Projection Multi-Layer Perceptron (SPMLP), and Layer Weight Sharing—to compress Transformer components. Without relying on distillation, SLlama achieves a 31.72% improvement in linguistic knowledge acquisition over the BabyLlama baseline, with a comparable GLUE score and significantly lower parameter count. These results demonstrate that well-designed, compact models can rival larger ones under strict data constraints.- Anthology ID:
- 2025.emnlp-main.1198
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 23491–23506
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1198/
- DOI:
- Cite (ACL):
- Victor Adelakun Omolaoye, Babajide Alamu Owoyele, and Gerard de Melo. 2025. SLlama: Parameter-Efficient Language Model Architecture for Enhanced Linguistic Competence Under Strict Data Constraints. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 23491–23506, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- SLlama: Parameter-Efficient Language Model Architecture for Enhanced Linguistic Competence Under Strict Data Constraints (Omolaoye et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1198.pdf