Abstract
We describe the development of a dedicated, high-accuracy part-of-speech (PoS) tagging solution for Faroese, a North Germanic language with about 50,000 speakers. To achieve this, a state-of-the-art neural PoS tagger for Icelandic, ABLTagger, was trained on a 100,000 word PoS-tagged corpus for Faroese, standardised with methods previously applied to Icelandic corpora. This tagger was supplemented with a novel Experimental Database of Faroese Inflection (EDFM), which contains morphological information on 67,488 Faroese words with about one million inflectional forms. This approach produced a PoS-tagging model for Faroese which achieves a 91.40% overall accuracy when evaluated with 10-fold cross validation, which is currently the highest reported accuracy for a dedicated Faroese PoS-tagger. The tagging model, morphological database, proposed revised PoS tagset for Faroese as well as a revised and standardised PoS tagged corpus are all presented as products of this project and are made available for use in further research in Faroese language technology- Anthology ID:
- 2020.icon-main.65
- Volume:
- Proceedings of the 17th International Conference on Natural Language Processing (ICON)
- Month:
- December
- Year:
- 2020
- Address:
- Indian Institute of Technology Patna, Patna, India
- Editors:
- Pushpak Bhattacharyya, Dipti Misra Sharma, Rajeev Sangal
- Venue:
- ICON
- SIG:
- Publisher:
- NLP Association of India (NLPAI)
- Note:
- Pages:
- 481–490
- Language:
- URL:
- https://aclanthology.org/2020.icon-main.65
- DOI:
- Cite (ACL):
- Hinrik Hafsteinsson and Anton Karl Ingason. 2020. Developing a Faroese PoS-tagging solution using Icelandic methods. In Proceedings of the 17th International Conference on Natural Language Processing (ICON), pages 481–490, Indian Institute of Technology Patna, Patna, India. NLP Association of India (NLPAI).
- Cite (Informal):
- Developing a Faroese PoS-tagging solution using Icelandic methods (Hafsteinsson & Ingason, ICON 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2020.icon-main.65.pdf