Abstract
The purpose of the research is to answer the question whether linguistic information is retained in vector representations of sentences. We introduce a method of analysing the content of sentence embeddings based on universal probing tasks, along with the classification datasets for two contrasting languages. We perform a series of probing and downstream experiments with different types of sentence embeddings, followed by a thorough analysis of the experimental results. Aside from dependency parser-based embeddings, linguistic information is retained best in the recently proposed LASER sentence embeddings.- Anthology ID:
- P19-1573
- Volume:
- Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
- Month:
- July
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Anna Korhonen, David Traum, Lluís Màrquez
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5729–5739
- Language:
- URL:
- https://aclanthology.org/P19-1573
- DOI:
- 10.18653/v1/P19-1573
- Cite (ACL):
- Katarzyna Krasnowska-Kieraś and Alina Wróblewska. 2019. Empirical Linguistic Study of Sentence Embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5729–5739, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Empirical Linguistic Study of Sentence Embeddings (Krasnowska-Kieraś & Wróblewska, ACL 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/P19-1573.pdf
- Data
- SentEval, Universal Dependencies