Abstract
Though recently there have been an increased interest in how pre-trained language models encode different linguistic features, there is still a lack of systematic comparison between languages with different morphology and syntax. In this paper, using BERT as an example of a pre-trained model, we compare how three typologically different languages (English, Korean, and Russian) encode morphology and syntax features across different layers. In particular, we contrast languages which differ in a particular aspect, such as flexibility of word order, head directionality, morphological type, presence of grammatical gender, and morphological richness, across four different tasks.- Anthology ID:
- 2022.sigtyp-1.4
- Volume:
- Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, Washington
- Editors:
- Ekaterina Vylomova, Edoardo Ponti, Ryan Cotterell
- Venue:
- SIGTYP
- SIG:
- SIGTYP
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 27–35
- Language:
- URL:
- https://aclanthology.org/2022.sigtyp-1.4
- DOI:
- 10.18653/v1/2022.sigtyp-1.4
- Cite (ACL):
- Yulia Otmakhova, Karin Verspoor, and Jey Han Lau. 2022. Cross-linguistic Comparison of Linguistic Feature Encoding in BERT Models for Typologically Different Languages. In Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 27–35, Seattle, Washington. Association for Computational Linguistics.
- Cite (Informal):
- Cross-linguistic Comparison of Linguistic Feature Encoding in BERT Models for Typologically Different Languages (Otmakhova et al., SIGTYP 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2022.sigtyp-1.4.pdf