Abstract
Measuring semantic similarity between texts is a crucial task in natural language processing. While existing semantic text matching focuses on pairs of similar-length sequences, matching texts with non-comparable lengths has broader applications in specific domains, such as comparing professional document summaries and content. Current approaches struggle with text pairs of non-comparable lengths due to truncation issues. To address this, we split texts into natural sentences and decouple sentence representations using supervised contrastive learning (SCL). Meanwhile, we adopt the embedded topic model (ETM) for specific domain data. Our experiments demonstrate the effectiveness of our model, based on decoupled and topic-informed sentence embeddings, in matching texts of significantly different lengths across three well-studied datasets.- Anthology ID:
- 2024.findings-naacl.81
- Volume:
- Findings of the Association for Computational Linguistics: NAACL 2024
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Kevin Duh, Helena Gomez, Steven Bethard
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1274–1280
- Language:
- URL:
- https://aclanthology.org/2024.findings-naacl.81
- DOI:
- Cite (ACL):
- Xixi Zhou, Chunbin Gu, Xin Jie, Jiajun Bu, and Haishuai Wang. 2024. Matching Varying-Length Texts via Topic-Informed and Decoupled Sentence Embeddings. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 1274–1280, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Matching Varying-Length Texts via Topic-Informed and Decoupled Sentence Embeddings (Zhou et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/fix-volume-bibkeys/2024.findings-naacl.81.pdf