Abstract
This study introduces a new method for distance-based unsupervised topical text classification using contextual embeddings. The method applies and tailors sentence embeddings for distance-based topical text classification. This is achieved by leveraging the semantic similarity between topic labels and text content, and reinforcing the relationship between them in a shared semantic space. The proposed method outperforms a wide range of existing sentence embeddings on average by 35%. Presenting an alternative to the commonly used transformer-based zero-shot general-purpose classifiers for multiclass text classification, the method demonstrates significant advantages in terms of computational efficiency and flexibility, while maintaining comparable or improved classification results.- Anthology ID:
- 2023.ranlp-1.64
- Volume:
- Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
- Month:
- September
- Year:
- 2023
- Address:
- Varna, Bulgaria
- Editors:
- Ruslan Mitkov, Galia Angelova
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd., Shoumen, Bulgaria
- Note:
- Pages:
- 586–597
- Language:
- URL:
- https://aclanthology.org/2023.ranlp-1.64
- DOI:
- Cite (ACL):
- Andriy Kosar, Guy De Pauw, and Walter Daelemans. 2023. Advancing Topical Text Classification: A Novel Distance-Based Method with Contextual Embeddings. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 586–597, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
- Cite (Informal):
- Advancing Topical Text Classification: A Novel Distance-Based Method with Contextual Embeddings (Kosar et al., RANLP 2023)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2023.ranlp-1.64.pdf