Abstract
In this paper, we propose a method to detect if words in two similar languages, Assamese and Bengali, are cognates. We mix phonetic, semantic, and articulatory features and use the cognate detection task to analyze the relative informational contribution of each type of feature to distinguish words in the two similar languages. In addition, since support for low-resourced languages like Assamese can be weak or nonexistent in some multilingual language models, we create a monolingual Assamese Transformer model and explore augmenting multilingual models with monolingual models using affine transformation techniques between vector spaces.- Anthology ID:
- 2022.vardial-1.5
- Volume:
- Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Editors:
- Yves Scherrer, Tommi Jauhiainen, Nikola Ljubešić, Preslav Nakov, Jörg Tiedemann, Marcos Zampieri
- Venue:
- VarDial
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 41–53
- Language:
- URL:
- https://aclanthology.org/2022.vardial-1.5
- DOI:
- Cite (ACL):
- Abhijnan Nath, Rahul Ghosh, and Nikhil Krishnaswamy. 2022. Phonetic, Semantic, and Articulatory Features in Assamese-Bengali Cognate Detection. In Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 41–53, Gyeongju, Republic of Korea. Association for Computational Linguistics.
- Cite (Informal):
- Phonetic, Semantic, and Articulatory Features in Assamese-Bengali Cognate Detection (Nath et al., VarDial 2022)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2022.vardial-1.5.pdf