Abstract
Phrase representations play an important role in data science and natural language processing, benefiting various tasks like Entity Alignment, Record Linkage, Fuzzy Joins, and Paraphrase Classification.The current state-of-the-art method involves fine-tuning pre-trained language models for phrasal embeddings using contrastive learning. However, we have identified areas for improvement. First, these pre-trained models tend to be unnecessarily complex and require to be pre-trained on a corpus with context sentences.Second, leveraging the phrase type and morphology gives phrase representations that are both more precise and more flexible.We propose an improved framework to learn phrase representations in a context-free fashion.The framework employs phrase type classification as an auxiliary task and incorporates character-level information more effectively into the phrase representation.Furthermore, we design three granularities of data augmentation to increase the diversity of training samples.Our experiments across a wide range of tasks reveal that our approach generates superior phrase embeddings compared to previous methods while requiring a smaller model size.- Anthology ID:
- 2024.findings-eacl.66
- Volume:
- Findings of the Association for Computational Linguistics: EACL 2024
- Month:
- March
- Year:
- 2024
- Address:
- St. Julian’s, Malta
- Editors:
- Yvette Graham, Matthew Purver
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 983–994
- Language:
- URL:
- https://aclanthology.org/2024.findings-eacl.66
- DOI:
- Cite (ACL):
- Lihu Chen, Gael Varoquaux, and Fabian Suchanek. 2024. Learning High-Quality and General-Purpose Phrase Representations. In Findings of the Association for Computational Linguistics: EACL 2024, pages 983–994, St. Julian’s, Malta. Association for Computational Linguistics.
- Cite (Informal):
- Learning High-Quality and General-Purpose Phrase Representations (Chen et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.findings-eacl.66.pdf