Abstract
This paper investigates neural character-based morphological tagging for languages with complex morphology and large tag sets. Character-based approaches are attractive as they can handle rarely- and unseen words gracefully. We evaluate on 14 languages and observe consistent gains over a state-of-the-art morphological tagger across all languages except for English and French, where we match the state-of-the-art. We compare two architectures for computing character-based word vectors using recurrent (RNN) and convolutional (CNN) nets. We show that the CNN based approach performs slightly worse and less consistently than the RNN based approach. Small but systematic gains are observed when combining the two architectures by ensembling.- Anthology ID:
- E17-1048
- Volume:
- Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers
- Month:
- April
- Year:
- 2017
- Address:
- Valencia, Spain
- Editors:
- Mirella Lapata, Phil Blunsom, Alexander Koller
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 505–513
- Language:
- URL:
- https://aclanthology.org/E17-1048
- DOI:
- Cite (ACL):
- Georg Heigold, Guenter Neumann, and Josef van Genabith. 2017. An Extensive Empirical Evaluation of Character-Based Morphological Tagging for 14 Languages. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 505–513, Valencia, Spain. Association for Computational Linguistics.
- Cite (Informal):
- An Extensive Empirical Evaluation of Character-Based Morphological Tagging for 14 Languages (Heigold et al., EACL 2017)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/E17-1048.pdf