Joonas Tapaninaho
Fixing paper assignments
- Please select all papers that belong to the same person.
- Indicate below which author they should be assigned to.
TODO: "submit" and "cancel" buttons here
2025
MoEP: Modular Expert Paths for Sample-Efficient Language Modeling
Joonas Tapaninaho
Proceedings of the First BabyLM Workshop
Training language models under tight compute budgets with small training datasets remains challenging for dense decoder-only Transformers, where every token activates the full stack of model parameters. We introduce MoEP (Modular Expert Paths), a sparse decoder-only architecture that enables more selective token activation, which increases model performance and accelerates learning without increasing the total number of parameters. We show that combining model parallelism with Mixture-of-Experts (MoE) style linear projections and a lightweight top-k router outperforms the GPT-2 baseline and stabilizes evaluation performance more quickly.