Cross-linguistic Comparison of Linguistic Feature Encoding in BERT Models for Typologically Different Languages

Yulia Otmakhova, Karin Verspoor, Jey Han Lau


Abstract
Though recently there have been an increased interest in how pre-trained language models encode different linguistic features, there is still a lack of systematic comparison between languages with different morphology and syntax. In this paper, using BERT as an example of a pre-trained model, we compare how three typologically different languages (English, Korean, and Russian) encode morphology and syntax features across different layers. In particular, we contrast languages which differ in a particular aspect, such as flexibility of word order, head directionality, morphological type, presence of grammatical gender, and morphological richness, across four different tasks.
Anthology ID:
2022.sigtyp-1.4
Volume:
Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:
July
Year:
2022
Address:
Seattle, Washington
Editors:
Ekaterina Vylomova, Edoardo Ponti, Ryan Cotterell
Venue:
SIGTYP
SIG:
SIGTYP
Publisher:
Association for Computational Linguistics
Note:
Pages:
27–35
Language:
URL:
https://aclanthology.org/2022.sigtyp-1.4
DOI:
10.18653/v1/2022.sigtyp-1.4
Bibkey:
Cite (ACL):
Yulia Otmakhova, Karin Verspoor, and Jey Han Lau. 2022. Cross-linguistic Comparison of Linguistic Feature Encoding in BERT Models for Typologically Different Languages. In Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 27–35, Seattle, Washington. Association for Computational Linguistics.
Cite (Informal):
Cross-linguistic Comparison of Linguistic Feature Encoding in BERT Models for Typologically Different Languages (Otmakhova et al., SIGTYP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2022.sigtyp-1.4.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-5/2022.sigtyp-1.4.mp4