Towards JointUD: Part-of-speech Tagging and Lemmatization using Recurrent Neural Networks

Gor Arakelyan, Karen Hambardzumyan, Hrant Khachatrian


Abstract
This paper describes our submission to CoNLL UD Shared Task 2018. We have extended an LSTM-based neural network designed for sequence tagging to additionally generate character-level sequences. The network was jointly trained to produce lemmas, part-of-speech tags and morphological features. Sentence segmentation, tokenization and dependency parsing were handled by UDPipe 1.2 baseline. The results demonstrate the viability of the proposed multitask architecture, although its performance still remains far from state-of-the-art.
Anthology ID:
K18-2018
Volume:
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Month:
October
Year:
2018
Address:
Brussels, Belgium
Editors:
Daniel Zeman, Jan Hajič
Venue:
CoNLL
SIG:
SIGNLL
Publisher:
Association for Computational Linguistics
Note:
Pages:
180–186
Language:
URL:
https://aclanthology.org/K18-2018
DOI:
10.18653/v1/K18-2018
Bibkey:
Cite (ACL):
Gor Arakelyan, Karen Hambardzumyan, and Hrant Khachatrian. 2018. Towards JointUD: Part-of-speech Tagging and Lemmatization using Recurrent Neural Networks. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 180–186, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Towards JointUD: Part-of-speech Tagging and Lemmatization using Recurrent Neural Networks (Arakelyan et al., CoNLL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/K18-2018.pdf
Code
 YerevaNN/JointUD
Data
Universal Dependencies