Neural Sequence-to-sequence Learning of Internal Word Structure

Tatyana Ruzsics; Tanja Samardzic

doi:10.18653/v1/K17-1020

Neural Sequence-to-sequence Learning of Internal Word Structure

[How to correct problems with metadata yourself]

Abstract

Learning internal word structure has recently been recognized as an important step in various multilingual processing tasks and in theoretical language comparison. In this paper, we present a neural encoder-decoder model for learning canonical morphological segmentation. Our model combines character-level sequence-to-sequence transformation with a language model over canonical segments. We obtain up to 4% improvement over a strong character-level encoder-decoder baseline for three languages. Our model outperforms the previous state-of-the-art for two languages, while eliminating the need for external resources such as large dictionaries. Finally, by comparing the performance of encoder-decoder and classical statistical machine translation systems trained with and without corpus counts, we show that including corpus counts is beneficial to both approaches.

Anthology ID:: K17-1020
Volume:: Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)
Month:: August
Year:: 2017
Address:: Vancouver, Canada
Editors:: Roger Levy, Lucia Specia
Venue:: CoNLL
SIG:: SIGNLL
Publisher:: Association for Computational Linguistics
Note:
Pages:: 184–194
Language:
URL:: https://aclanthology.org/K17-1020
DOI:: 10.18653/v1/K17-1020
Bibkey:
Cite (ACL):: Tatyana Ruzsics and Tanja Samardžić. 2017. Neural Sequence-to-sequence Learning of Internal Word Structure. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 184–194, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):: Neural Sequence-to-sequence Learning of Internal Word Structure (Ruzsics & Samardžić, CoNLL 2017)
Copy Citation:
PDF:: https://preview.aclanthology.org/teach-a-man-to-fish/K17-1020.pdf

PDF Search