Less Mature is More Adaptable for Sentence-level Language Modeling
Abhilasha Sancheti, David Dale, Artyom Kozhevnikov, Maha Elbayad
Abstract
This work investigates sentence-level models (i.e., models that operate at the sentence-level) to study how sentence representations from various encoders influence downstream task performance, and which syntactic, semantic, and discourse-level properties are essential for strong performance. Our experiments encompass encoders with diverse training regimes and pretraining domains, as well as various pooling strategies applied to multi-sentence input tasks (including sentence ordering, sentiment classification, and natural language inference) requiring coarse-to-fine-grained reasoning. We find that ”less mature” representations (e.g., mean-pooled representations from BERT’s first or last layer, or representations from encoders with limited fine-tuning) exhibit greater generalizability and adaptability to downstream tasks compared to representations from extensively fine-tuned models (e.g., SBERT or SimCSE). These findings are consistent across different pretraining seed initializations for BERT. Our probing analysis reveals that syntactic and discourse-level properties are stronger indicators of downstream performance than MTEB scores or decodability. Furthermore, the data and time efficiency of sentence-level models, often outperforming token-level models, underscores their potential for future research.- Anthology ID:
- 2025.acl-long.573
- Volume:
- Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 11680–11695
- Language:
- URL:
- https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.573/
- DOI:
- Cite (ACL):
- Abhilasha Sancheti, David Dale, Artyom Kozhevnikov, and Maha Elbayad. 2025. Less Mature is More Adaptable for Sentence-level Language Modeling. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11680–11695, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Less Mature is More Adaptable for Sentence-level Language Modeling (Sancheti et al., ACL 2025)
- PDF:
- https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.573.pdf