Abstract
In this work, I present ÚFAL submission for the supervised task of detecting cognates and derivatives. Cognates are word pairs in different languages sharing the origin in earlier attested forms in ancestral language, while derivatives come directly from another language. For the task, I developed gradient boosted tree classifier trained on linguistic and statistical features. The solution came first from two delivered systems with an 87% F1 score on the test split. This write-up gives an insight into the system and shows the importance of using linguistic features and character-level statistics for the task.- Anthology ID:
- 2023.sigtyp-1.14
- Volume:
- Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
- Month:
- May
- Year:
- 2023
- Address:
- Dubrovnik, Croatia
- Editors:
- Lisa Beinborn, Koustava Goswami, Saliha Muradoğlu, Alexey Sorokin, Ritesh Kumar, Andreas Shcherbakov, Edoardo M. Ponti, Ryan Cotterell, Ekaterina Vylomova
- Venue:
- SIGTYP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 132–136
- Language:
- URL:
- https://aclanthology.org/2023.sigtyp-1.14
- DOI:
- 10.18653/v1/2023.sigtyp-1.14
- Cite (ACL):
- Tomasz Limisiewicz. 2023. ÚFAL Submission for SIGTYP Supervised Cognate Detection Task. In Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 132–136, Dubrovnik, Croatia. Association for Computational Linguistics.
- Cite (Informal):
- ÚFAL Submission for SIGTYP Supervised Cognate Detection Task (Limisiewicz, SIGTYP 2023)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2023.sigtyp-1.14.pdf