Abstract
We study methods for learning sentence embeddings with syntactic structure. We focus on methods of learning syntactic sentence-embeddings by using a multilingual parallel-corpus augmented by Universal Parts-of-Speech tags. We evaluate the quality of the learned embeddings by examining sentence-level nearest neighbours and functional dissimilarity in the embedding space. We also evaluate the ability of the method to learn syntactic sentence-embeddings for low-resource languages and demonstrate strong evidence for transfer learning. Our results show that syntactic sentence-embeddings can be learned while using less training data, fewer model parameters, and resulting in better evaluation metrics than state-of-the-art language models.- Anthology ID:
- D19-5521
- Volume:
- Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong, China
- Editors:
- Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
- Venue:
- WNUT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 153–159
- Language:
- URL:
- https://aclanthology.org/D19-5521
- DOI:
- 10.18653/v1/D19-5521
- Cite (ACL):
- Chen Liu, Anderson De Andrade, and Muhammad Osama. 2019. Exploring Multilingual Syntactic Sentence Representations. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pages 153–159, Hong Kong, China. Association for Computational Linguistics.
- Cite (Informal):
- Exploring Multilingual Syntactic Sentence Representations (Liu et al., WNUT 2019)
- PDF:
- https://preview.aclanthology.org/ml4al-ingestion/D19-5521.pdf
- Code
- ccliu2/syn-emb
- Data
- OpenSubtitles