Learning from Child-directed Speech in Two-language Scenarios: A French-English Case-Study

Liel Binyamin, Elior Sulem


Abstract
Research on developmentally plausible language models has so far centered on English, leaving open questions about multilingual settings. We present a systematic study of compact models by extending BabyBERTa to English–French scenarios under strictly size-matched data conditions, addressing monolingual, bilingual, and cross-lingual settings. Our design contrasts two corpus types: (i) child-directed speech (2.5M tokens), following BabyBERTa and related work, and (ii) multi-domain corpora (10M tokens), extending the BabyLM framework to French. To support fair evaluation, we also introduce new resources: French versions of QAMR and QASRL, and an English and French multi-domain corpus.We evaluate the models on both syntactic and semantic tasks, comparing with Wikipedia-only training. Results reveal context-dependent effects: training on Wikipedia consistently favors semantic tasks, while child-directed speech improves grammatical judgments in monolingual settings. Bilingual pretraining yields notable gains for textual entailment, disproportionately benefiting French. Importantly, the same relative patterns are observed across BabyBERTa, RoBERTa, and LTG-BERT, indicating consistent trends across the tested architectures.
Anthology ID:
2026.findings-eacl.337
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6412–6426
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.337/
DOI:
Bibkey:
Cite (ACL):
Liel Binyamin and Elior Sulem. 2026. Learning from Child-directed Speech in Two-language Scenarios: A French-English Case-Study. In Findings of the Association for Computational Linguistics: EACL 2026, pages 6412–6426, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Learning from Child-directed Speech in Two-language Scenarios: A French-English Case-Study (Binyamin & Sulem, Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.337.pdf
Checklist:
 2026.findings-eacl.337.checklist.pdf