Data-Driven Morphological Analysis and Disambiguation for Morphologically Rich Languages and Universal Dependencies

Amir More, Reut Tsarfaty

[How to correct problems with metadata yourself]


Abstract
Parsing texts into universal dependencies (UD) in realistic scenarios requires infrastructure for the morphological analysis and disambiguation (MA&D) of typologically different languages as a first tier. MA&D is particularly challenging in morphologically rich languages (MRLs), where the ambiguous space-delimited tokens ought to be disambiguated with respect to their constituent morphemes, each morpheme carrying its own tag and a rich set features. Here we present a novel, language-agnostic, framework for MA&D, based on a transition system with two variants — word-based and morpheme-based — and a dedicated transition to mitigate the biases of variable-length morpheme sequences. Our experiments on a Modern Hebrew case study show state of the art results, and we show that the morpheme-based MD consistently outperforms our word-based variant. We further illustrate the utility and multilingual coverage of our framework by morphologically analyzing and disambiguating the large set of languages in the UD treebanks.
Anthology ID:
C16-1033
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Yuji Matsumoto, Rashmi Prasad
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
337–348
Language:
URL:
https://aclanthology.org/C16-1033
DOI:
Bibkey:
Cite (ACL):
Amir More and Reut Tsarfaty. 2016. Data-Driven Morphological Analysis and Disambiguation for Morphologically Rich Languages and Universal Dependencies. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 337–348, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Data-Driven Morphological Analysis and Disambiguation for Morphologically Rich Languages and Universal Dependencies (More & Tsarfaty, COLING 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/teach-a-man-to-fish/C16-1033.pdf
Code
 habeanf/yap