@inproceedings{farhan-etal-2020-enhanced,
    title = "Enhanced {U}rdu Word Segmentation using Conditional Random Fields and Morphological Context Features",
    author = "Farhan, Aamir  and
      Islam, Mashrukh  and
      Sharma, Dipti Misra",
    editor = "Cunha, Rossana  and
      Shaikh, Samira  and
      Varis, Erika  and
      Georgi, Ryan  and
      Tsai, Alicia  and
      Anastasopoulos, Antonios  and
      Chandu, Khyathi Raghavi",
    booktitle = "Proceedings of the Fourth Widening Natural Language Processing Workshop",
    month = jul,
    year = "2020",
    address = "Seattle, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2020.winlp-1.41/",
    doi = "10.18653/v1/2020.winlp-1.41",
    pages = "156--159",
    abstract = "Word segmentation is a fundamental task for most of the NLP applications. Urdu adopts Nastalique writing style which does not have a concept of space. Furthermore, the inherent non-joining attributes of certain characters in Urdu create spaces within a word while writing in digital format. Thus, Urdu not only has space omission but also space insertion issues which make the word segmentation task challenging. In this paper, we improve upon the results of Zia, Raza and Athar (2018) by using a manually annotated corpus of 19,651 sentences along with morphological context features. Using the Conditional Random Field sequence modeler, our model achieves F 1 score of 0.98 for word boundary identification and 0.92 for sub-word boundary identification tasks. The results demonstrated in this paper outperform the state-of-the-art methods."
}Markdown (Informal)
[Enhanced Urdu Word Segmentation using Conditional Random Fields and Morphological Context Features](https://preview.aclanthology.org/ingest-emnlp/2020.winlp-1.41/) (Farhan et al., WiNLP 2020)
ACL