Learning Thematic Similarity Metric from Article Sections Using Triplet Networks
Liat Ein Dor, Yosi Mass, Alon Halfon, Elad Venezian, Ilya Shnayderman, Ranit Aharonov, Noam Slonim
Abstract
In this paper we suggest to leverage the partition of articles into sections, in order to learn thematic similarity metric between sentences. We assume that a sentence is thematically closer to sentences within its section than to sentences from other sections. Based on this assumption, we use Wikipedia articles to automatically create a large dataset of weakly labeled sentence triplets, composed of a pivot sentence, one sentence from the same section and one from another section. We train a triplet network to embed sentences from the same section closer. To test the performance of the learned embeddings, we create and release a sentence clustering benchmark. We show that the triplet network learns useful thematic metrics, that significantly outperform state-of-the-art semantic similarity methods and multipurpose embeddings on the task of thematic clustering of sentences. We also show that the learned embeddings perform well on the task of sentence semantic similarity prediction.- Anthology ID:
- P18-2009
- Volume:
- Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
- Month:
- July
- Year:
- 2018
- Address:
- Melbourne, Australia
- Editors:
- Iryna Gurevych, Yusuke Miyao
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 49–54
- Language:
- URL:
- https://aclanthology.org/P18-2009
- DOI:
- 10.18653/v1/P18-2009
- Cite (ACL):
- Liat Ein Dor, Yosi Mass, Alon Halfon, Elad Venezian, Ilya Shnayderman, Ranit Aharonov, and Noam Slonim. 2018. Learning Thematic Similarity Metric from Article Sections Using Triplet Networks. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 49–54, Melbourne, Australia. Association for Computational Linguistics.
- Cite (Informal):
- Learning Thematic Similarity Metric from Article Sections Using Triplet Networks (Ein Dor et al., ACL 2018)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/P18-2009.pdf