Abstract
Lexical semantic relations (LSRs) characterize meaning relationships between words and play an important role in systematic generalization on lexical inference tasks. Notably, several tasks that require knowledge of hypernymy still pose a challenge for pretrained language models (LMs) such as BERT, underscoring the need to better align their linguistic behavior with our knowledge of LSRs. In this paper, we propose Balaur, a model that addresses this challenge by modeling LSRs directly in the LM’s hidden states throughout pretraining. Motivating our approach is the hypothesis that the internal representations of LMs can provide an interface to their observable linguistic behavior, and that by controlling one we can influence the other. We validate our hypothesis and demonstrate that Balaur generally improves the performance of large transformer-based LMs on a comprehensive set of hypernymy-informed tasks, as well as on the original LM objective. Code and data are made available at https://github.com/mirandrom/balaur- Anthology ID:
- 2023.findings-emnlp.674
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 10054–10070
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.674
- DOI:
- 10.18653/v1/2023.findings-emnlp.674
- Cite (ACL):
- Andrei Mircea and Jackie Cheung. 2023. Balaur: Language Model Pretraining with Lexical Semantic Relations. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10054–10070, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Balaur: Language Model Pretraining with Lexical Semantic Relations (Mircea & Cheung, Findings 2023)
- PDF:
- https://preview.aclanthology.org/emnlp-22-attachments/2023.findings-emnlp.674.pdf