A Fast and Accurate Partially Deterministic Morphological Analysis

Hajime Morita, Tomoya Iwakura


Abstract
This paper proposes a partially deterministic morphological analysis method for improved processing speed. Maximum matching is a fast deterministic method for morphological analysis. However, the method tends to decrease performance due to lack of consideration of contextual information. In order to use maximum matching safely, we propose the use of Context Independent Strings (CISs), which are strings that do not have ambiguity in terms of morphological analysis. Our method first identifies CISs in a sentence using maximum matching without contextual information, then analyzes the unprocessed part of the sentence using a bi-gram-based morphological analysis model. We evaluate the method on a Japanese morphological analysis task. The experimental results show a 30% reduction of running time while maintaining improved accuracy.
Anthology ID:
R19-1093
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
804–809
Language:
URL:
https://aclanthology.org/R19-1093
DOI:
10.26615/978-954-452-056-4_093
Bibkey:
Cite (ACL):
Hajime Morita and Tomoya Iwakura. 2019. A Fast and Accurate Partially Deterministic Morphological Analysis. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 804–809, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
A Fast and Accurate Partially Deterministic Morphological Analysis (Morita & Iwakura, RANLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp22-frontmatter/R19-1093.pdf