Exploring Multilingual Syntactic Sentence Representations

Chen Liu, Anderson De Andrade, Muhammad Osama


Abstract
We study methods for learning sentence embeddings with syntactic structure. We focus on methods of learning syntactic sentence-embeddings by using a multilingual parallel-corpus augmented by Universal Parts-of-Speech tags. We evaluate the quality of the learned embeddings by examining sentence-level nearest neighbours and functional dissimilarity in the embedding space. We also evaluate the ability of the method to learn syntactic sentence-embeddings for low-resource languages and demonstrate strong evidence for transfer learning. Our results show that syntactic sentence-embeddings can be learned while using less training data, fewer model parameters, and resulting in better evaluation metrics than state-of-the-art language models.
Anthology ID:
D19-5521
Volume:
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
Venue:
WNUT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
153–159
Language:
URL:
https://aclanthology.org/D19-5521
DOI:
10.18653/v1/D19-5521
Bibkey:
Cite (ACL):
Chen Liu, Anderson De Andrade, and Muhammad Osama. 2019. Exploring Multilingual Syntactic Sentence Representations. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pages 153–159, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Exploring Multilingual Syntactic Sentence Representations (Liu et al., WNUT 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/ml4al-ingestion/D19-5521.pdf
Code
 ccliu2/syn-emb
Data
OpenSubtitles