Analyzing Pre-processing Settings for Urdu Single-document Extractive Summarization

Muhammad Humayoun; Hwanjo Yu

Analyzing Pre-processing Settings for Urdu Single-document Extractive Summarization

Abstract

Preprocessing is a preliminary step in many fields including IR and NLP. The effect of basic preprocessing settings on English for text summarization is well-studied. However, there is no such effort found for the Urdu language (with the best of our knowledge). In this study, we analyze the effect of basic preprocessing settings for single-document text summarization for Urdu, on a benchmark corpus using various experiments. The analysis is performed using the state-of-the-art algorithms for extractive summarization and the effect of stopword removal, lemmatization, and stemming is analyzed. Results showed that these pre-processing settings improve the results.

Anthology ID:: L16-1585
Volume:: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:: May
Year:: 2016
Address:: Portorož, Slovenia
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 3686–3693
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/L16-1585/
DOI:
Bibkey:
Cite (ACL):: Muhammad Humayoun and Hwanjo Yu. 2016. Analyzing Pre-processing Settings for Urdu Single-document Extractive Summarization. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3686–3693, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):: Analyzing Pre-processing Settings for Urdu Single-document Extractive Summarization (Humayoun & Yu, LREC 2016)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/L16-1585.pdf
Code: humsha/USCorpus
Data: CC-News

PDF Cite Search Code Fix data