Part-of-speech tagging of Swedish texts in the neural era

Yvonne Adesam, Aleksandrs Berdicevskis


Abstract
We train and test five open-source taggers, which use different methods, on three Swedish corpora, which are of comparable size but use different tagsets. The KB-Bert tagger achieves the highest accuracy for part-of-speech and morphological tagging, while being fast enough for practical use. We also compare the performance across tagsets and across different genres in one of the corpora. We perform manual error analysis and perform a statistical analysis of factors which affect how difficult specific tags are. Finally, we test ensemble methods, showing that a small (but not significant) improvement over the best-performing tagger can be achieved.
Anthology ID:
2021.nodalida-main.20
Volume:
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May 31--2 June
Year:
2021
Address:
Reykjavik, Iceland (Online)
Editors:
Simon Dobnik, Lilja Øvrelid
Venue:
NoDaLiDa
SIG:
Publisher:
Linköping University Electronic Press, Sweden
Note:
Pages:
200–209
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2021.nodalida-main.20/
DOI:
Bibkey:
Cite (ACL):
Yvonne Adesam and Aleksandrs Berdicevskis. 2021. Part-of-speech tagging of Swedish texts in the neural era. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 200–209, Reykjavik, Iceland (Online). Linköping University Electronic Press, Sweden.
Cite (Informal):
Part-of-speech tagging of Swedish texts in the neural era (Adesam & Berdicevskis, NoDaLiDa 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/2021.nodalida-main.20.pdf
Code
 aleksandrsberdicevskis/swetagging2021
Data
Universal Dependencies