What’s Wrong with Hebrew NLP? And How to Make it Right

Reut Tsarfaty, Shoval Sadde, Stav Klein, Amit Seker


Abstract
For languages with simple morphology such as English, automatic annotation pipelines such as spaCy or Stanford’s CoreNLP successfully serve projects in academia and the industry. For many morphologically-rich languages (MRLs), similar pipelines show sub-optimal performance that limits their applicability for text analysis in research and the industry. The sub-optimal performance is mainly due to errors in early morphological disambiguation decisions, that cannot be recovered later on in the pipeline, yielding incoherent annotations on the whole. This paper describes the design and use of the ONLP suite, a joint morpho-syntactic infrastructure for processing Modern Hebrew texts. The joint inference over morphology and syntax substantially limits error propagation, and leads to high accuracy. ONLP provides rich and expressive annotations which already serve diverse academic and commercial needs. Its accompanying demo further serves educational activities, introducing Hebrew NLP intricacies to researchers and non-researchers alike.
Anthology ID:
D19-3044
Volume:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Sebastian Padó, Ruihong Huang
Venues:
EMNLP | IJCNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
259–264
Language:
URL:
https://aclanthology.org/D19-3044
DOI:
10.18653/v1/D19-3044
Bibkey:
Cite (ACL):
Reut Tsarfaty, Shoval Sadde, Stav Klein, and Amit Seker. 2019. What’s Wrong with Hebrew NLP? And How to Make it Right. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, pages 259–264, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
What’s Wrong with Hebrew NLP? And How to Make it Right (Tsarfaty et al., EMNLP-IJCNLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/D19-3044.pdf