Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik
Patrick Littell, David R. Mortensen, Kartik Goyal, Chris Dyer, Lori Levin
Abstract
In Sorani Kurdish, one of the most useful orthographic features in named-entity recognition – capitalization – is absent, as the language’s Perso-Arabic script does not make a distinction between uppercase and lowercase letters. We describe a system for deriving an inferred capitalization value from closely related languages by phonological similarity, and illustrate the system using several related Western Iranian languages.- Anthology ID:
- L16-1529
- Volume:
- Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
- Month:
- May
- Year:
- 2016
- Address:
- Portorož, Slovenia
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 3318–3324
- Language:
- URL:
- https://aclanthology.org/L16-1529
- DOI:
- Cite (ACL):
- Patrick Littell, David R. Mortensen, Kartik Goyal, Chris Dyer, and Lori Levin. 2016. Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3318–3324, Portorož, Slovenia. European Language Resources Association (ELRA).
- Cite (Informal):
- Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik (Littell et al., LREC 2016)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/L16-1529.pdf