HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning

Manish Bhattarai, Ryan Barron, Maksim E. Eren, Minh N. Vu, Vesselin Grantcharov, Ismael Ismael, Valentin Stanev, Cynthia Matuszek, Vladimir I Valtchinov, Kim Rasmussen, Boian S. Alexandrov


Abstract
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external document retrieval to provide domain-specific or up-to-date knowledge. The effectiveness of RAG depends on the relevance of retrieved documents, which is influenced by the semantic alignment of embeddings with the domain’s specialized content. Although full fine-tuning can align language models to specific domains, it is computationally intensive and demands substantial data. This paper introduces Hierarchical Embedding Alignment Loss (HEAL), a novel method that leverages hierarchical fuzzy clustering with matrix factorization within contrastive learning to efficiently align LLM embeddings with domain-specific content. HEAL computes level/depth-wise contrastive losses and incorporates hierarchical penalties to align embeddings with the underlying relationships in label hierarchies. This approach enhances retrieval relevance and document classification, effectively reducing hallucinations in LLM outputs. In our experiments, we benchmark and evaluate HEAL across diverse domains, including Healthcare, Material Science, Cyber-security, and Applied Maths.
Anthology ID:
2025.knowledgenlp-1.19
Volume:
Proceedings of the 4th International Workshop on Knowledge-Augmented Methods for Natural Language Processing
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico, USA
Editors:
Weijia Shi, Wenhao Yu, Akari Asai, Meng Jiang, Greg Durrett, Hannaneh Hajishirzi, Luke Zettlemoyer
Venues:
KnowledgeNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
205–214
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.knowledgenlp-1.19/
DOI:
Bibkey:
Cite (ACL):
Manish Bhattarai, Ryan Barron, Maksim E. Eren, Minh N. Vu, Vesselin Grantcharov, Ismael Ismael, Valentin Stanev, Cynthia Matuszek, Vladimir I Valtchinov, Kim Rasmussen, and Boian S. Alexandrov. 2025. HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning. In Proceedings of the 4th International Workshop on Knowledge-Augmented Methods for Natural Language Processing, pages 205–214, Albuquerque, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning (Bhattarai et al., KnowledgeNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.knowledgenlp-1.19.pdf