Abstract
Word embeddings and Language models are the building blocks of modern Deep Neural Network-based Natural Language Processing. They are extensively explored in high-resource languages and provide state-of-the-art (SOTA) performance for a wide range of downstream tasks. Nevertheless, these word embeddings are not explored in languages such as Assamese, where resources are limited. Furthermore, there has been limited study into the performance evaluation of these word embeddings for low-resource languages in downstream tasks. In this research, we explore the current state of Assamese pre-trained word embeddings. We evaluate these embeddings’ performance on sequence labeling tasks such as Parts-of-speech and Named Entity Recognition. In order to assess the efficiency of the embeddings, experiments are performed utilizing both ensemble and individual word embedding approaches. The ensembling approach that uses three word embeddings outperforms the others. In the paper, the outcomes of the investigations are described. The results of this comparative performance evaluation may assist researchers in choosing an Assamese pre-trained word embedding for subsequent tasks.- Anthology ID:
- 2024.lrec-main.568
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 6418–6425
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.568
- DOI:
- Cite (ACL):
- Dhrubajyoti Pathak, Sukumar Nandi, and Priyankoo Sarmah. 2024. Evaluating Performance of Pre-trained Word Embeddings on Assamese, a Low-resource Language. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 6418–6425, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Evaluating Performance of Pre-trained Word Embeddings on Assamese, a Low-resource Language (Pathak et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/add_acl24_videos/2024.lrec-main.568.pdf