Less Mature is More Adaptable for Sentence-level Language Modeling

Abhilasha Sancheti, David Dale, Artyom Kozhevnikov, Maha Elbayad


Abstract
This work investigates sentence-level models (i.e., models that operate at the sentence-level) to study how sentence representations from various encoders influence downstream task performance, and which syntactic, semantic, and discourse-level properties are essential for strong performance. Our experiments encompass encoders with diverse training regimes and pretraining domains, as well as various pooling strategies applied to multi-sentence input tasks (including sentence ordering, sentiment classification, and natural language inference) requiring coarse-to-fine-grained reasoning. We find that ”less mature” representations (e.g., mean-pooled representations from BERT’s first or last layer, or representations from encoders with limited fine-tuning) exhibit greater generalizability and adaptability to downstream tasks compared to representations from extensively fine-tuned models (e.g., SBERT or SimCSE). These findings are consistent across different pretraining seed initializations for BERT. Our probing analysis reveals that syntactic and discourse-level properties are stronger indicators of downstream performance than MTEB scores or decodability. Furthermore, the data and time efficiency of sentence-level models, often outperforming token-level models, underscores their potential for future research.
Anthology ID:
2025.acl-long.573
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11680–11695
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.573/
DOI:
Bibkey:
Cite (ACL):
Abhilasha Sancheti, David Dale, Artyom Kozhevnikov, and Maha Elbayad. 2025. Less Mature is More Adaptable for Sentence-level Language Modeling. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11680–11695, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Less Mature is More Adaptable for Sentence-level Language Modeling (Sancheti et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.573.pdf