SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation

Mahi Luthra, Jiayi Shen, Maxime Poli, Angelo Ortiz Tandazo, Yosuke Higuchi, Youssef Benchekroun, Martin Gleize, Charles-\'Eric Saint-James, Dongyan Lin, Phillip Rust, Angel Villar-Corrales, Surya, Vanessa Stark, Rashel Moritz, Juan Pino, Yann LeCun, Emmanuel Dupoux


Abstract
Human infants, with only a few hundred hours of speech exposure, acquire basic units of new languages, highlighting a striking efficiency gap compared to the data-hungry self-supervised speech models. To address this gap, this paper introduces SpidR-Adapt for rapid adaptation of speech units to new languages using minimal unlabeled data. We cast such low-resource speech representation learning as a meta-learning problem and construct a multi-task adaptive pre-training (MAdaPT) protocol which formulates the adaptation process as a bi-level optimization framework. To enable scalable meta-training under this framework, we propose a novel heuristic solution, first-order bi-level optimization (FOBLO), avoiding heavy computation costs. Finally, we stabilize meta-training by using a robust initialization through interleaved supervision which alternates self-supervised and supervised objectives. Empirically, SpidR-Adapt achieves rapid gains in phonemic discriminability (ABX) and downstream spoken language modeling scores (sWUGGY, sBLIMP, tSC), surpassing in-domain toplines after training on less than 1h of target-language audio and delivering 100× greater data efficiency than standard multi-task training.. These findings highlight a practical, architecture-agnostic path toward biologically inspired, data-efficient representations. We open-source the training code and model checkpoints at https://github.com/facebookresearch/spidr-adapt.
Anthology ID:
2026.acl-long.1325
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
28705–28728
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1325/
DOI:
Bibkey:
Cite (ACL):
Mahi Luthra, Jiayi Shen, Maxime Poli, Angelo Ortiz Tandazo, Yosuke Higuchi, Youssef Benchekroun, Martin Gleize, Charles-\'Eric Saint-James, Dongyan Lin, Phillip Rust, Angel Villar-Corrales, Surya, Vanessa Stark, Rashel Moritz, Juan Pino, Yann LeCun, and Emmanuel Dupoux. 2026. SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 28705–28728, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation (Luthra et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1325.pdf
Checklist:
 2026.acl-long.1325.checklist.pdf