Developing a Monolingual Sentence Simplification Corpus for Urdu
Yusra Anees, Sadaf Abdul Rauf, Nauman Iqbal, Abdul Basit Siddiqi
Abstract
Complex sentences are a hurdle in the learning process of language learners. Sentence simplification aims to convert a complex sentence into its simpler form such that it is easily comprehensible. To build such automated simplification systems, corpora of complex sentences and their simplified versions is the first step to understand sentence complexity and enable the development of automatic text simplification systems. No such corpus has yet been developed for Urdu and we fill this gap by developing one such corpus to help start readability and automatic sentence simplification research. We present a lexical and syntactically simplified Urdu simplification corpus and a detailed analysis of the various simplification operations. We further analyze our corpora using text readability measures and present a comparison of the original, lexical simplified, and syntactically simplified corpora.- Anthology ID:
- 2020.winlp-1.23
- Volume:
- Proceedings of the The Fourth Widening Natural Language Processing Workshop
- Month:
- July
- Year:
- 2020
- Address:
- Seattle, USA
- Venue:
- WiNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 92–95
- Language:
- URL:
- https://aclanthology.org/2020.winlp-1.23
- DOI:
- 10.18653/v1/2020.winlp-1.23
- Cite (ACL):
- Yusra Anees, Sadaf Abdul Rauf, Nauman Iqbal, and Abdul Basit Siddiqi. 2020. Developing a Monolingual Sentence Simplification Corpus for Urdu. In Proceedings of the The Fourth Widening Natural Language Processing Workshop, pages 92–95, Seattle, USA. Association for Computational Linguistics.
- Cite (Informal):
- Developing a Monolingual Sentence Simplification Corpus for Urdu (Anees et al., WiNLP 2020)