Abstract
We annotate a small corpus of religious Arabic with morphological segmentation boundaries and fine-grained segment-based part of speech tags. Experiments on both segmentation and POS tagging show that the religious corpus-trained segmenter and POS tagger outperform the Arabic Treebak-trained ones although the latter is 21 times as big, which shows the need for building religious Arabic linguistic resources. The small corpus we annotate improves segmentation accuracy by 5% absolute (from 90.84% to 95.70%), and POS tagging by 9% absolute (from 82.22% to 91.26) when using gold standard segmentation, and by 9.6% absolute (from 78.62% to 88.22) when using automatic segmentation.- Anthology ID:
- 2012.amta-caas14.9
- Volume:
- Fourth Workshop on Computational Approaches to Arabic-Script-based Languages
- Month:
- November 1
- Year:
- 2012
- Address:
- San Diego, California, USA
- Venue:
- AMTA
- SIG:
- Publisher:
- Association for Machine Translation in the Americas
- Note:
- Pages:
- 65–71
- Language:
- URL:
- https://aclanthology.org/2012.amta-caas14.9
- DOI:
- Cite (ACL):
- Emad Mohamed. 2012. Morphological Segmentation and Part of Speech Tagging for Religious Arabic. In Fourth Workshop on Computational Approaches to Arabic-Script-based Languages, pages 65–71, San Diego, California, USA. Association for Machine Translation in the Americas.
- Cite (Informal):
- Morphological Segmentation and Part of Speech Tagging for Religious Arabic (Mohamed, AMTA 2012)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/2012.amta-caas14.9.pdf