Abstract
Since the advent of word embedding methods, the representation of longer pieces of texts such as sentences and paragraphs is gaining more and more interest, especially for textual similarity tasks. Mikolov et al. (2013) have demonstrated that words and phrases exhibit linear structures that allow to meaningfully combine words by an element-wise addition of their vector representations. Recently, Arora et al. (2017) have shown that removing the projections of the weighted average sum of word embedding vectors on their first principal components, outperforms sophisticated supervised methods including RNN’s and LSTM’s. Inspired by Mikolov et al. (2013) and Arora et al. (2017) findings and by a bilingual word mapping technique presented in Artetxe et al. (2016), we introduce MappSent, a novel approach for textual similarity. Based on a linear sentence embedding representation, its principle is to build a matrix that maps sentences in a joint-subspace where similar sets of sentences are pushed closer. We evaluate our approach on the SemEval 2016/2017 question-to-question similarity task and show that overall MappSent achieves competitive results and outperforms in most cases state-of-art methods.- Anthology ID:
- R17-1040
- Volume:
- Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
- Month:
- September
- Year:
- 2017
- Address:
- Varna, Bulgaria
- Editors:
- Ruslan Mitkov, Galia Angelova
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 291–300
- Language:
- URL:
- https://doi.org/10.26615/978-954-452-049-6_040
- DOI:
- 10.26615/978-954-452-049-6_040
- Cite (ACL):
- Amir Hazem, Basma El Amel Boussaha, and Nicolas Hernandez. 2017. MappSent: a Textual Mapping Approach for Question-to-Question Similarity. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 291–300, Varna, Bulgaria. INCOMA Ltd..
- Cite (Informal):
- MappSent: a Textual Mapping Approach for Question-to-Question Similarity (Hazem et al., RANLP 2017)
- PDF:
- https://doi.org/10.26615/978-954-452-049-6_040