Leveraging Pre-Trained Embeddings for Welsh Taggers

Ignatius Ezeani, Scott Piao, Steven Neale, Paul Rayson, Dawn Knight


Abstract
While the application of word embedding models to downstream Natural Language Processing (NLP) tasks has been shown to be successful, the benefits for low-resource languages is somewhat limited due to lack of adequate data for training the models. However, NLP research efforts for low-resource languages have focused on constantly seeking ways to harness pre-trained models to improve the performance of NLP systems built to process these languages without the need to re-invent the wheel. One such language is Welsh and therefore, in this paper, we present the results of our experiments on learning a simple multi-task neural network model for part-of-speech and semantic tagging for Welsh using a pre-trained embedding model from FastText. Our model’s performance was compared with those of the existing rule-based stand-alone taggers for part-of-speech and semantic taggers. Despite its simplicity and capacity to perform both tasks simultaneously, our tagger compared very well with the existing taggers.
Anthology ID:
W19-4332
Volume:
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
Month:
August
Year:
2019
Address:
Florence, Italy
Venue:
RepL4NLP
SIG:
SIGREP
Publisher:
Association for Computational Linguistics
Note:
Pages:
270–280
Language:
URL:
https://aclanthology.org/W19-4332
DOI:
10.18653/v1/W19-4332
Bibkey:
Cite (ACL):
Ignatius Ezeani, Scott Piao, Steven Neale, Paul Rayson, and Dawn Knight. 2019. Leveraging Pre-Trained Embeddings for Welsh Taggers. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 270–280, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Leveraging Pre-Trained Embeddings for Welsh Taggers (Ezeani et al., RepL4NLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/W19-4332.pdf