Hossam Ibrahim


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2008

pdf bib
Automatic Extraction of Textual Elements from News Web Pages
Hossam Ibrahim | Kareem Darwish | Abdel-Rahim Madany
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we present an algorithm for automatic extraction of textual elements, namely titles and full text, associated with news stories in news web pages. We propose a supervised machine learning classification technique based on the use of a Support Vector Machine (SVM) classifier to extract the desired textual elements. The technique uses internal structural features of a webpage without relying on the Document Object Model to which many content authors fail to adhere. The classifier uses a set of features which rely on the length of text, the percentage of hypertext, etc. The resulting classifier is nearly perfect on previously unseen news pages from different sites. The proposed technique is successfully employed in Alzoa.com, which is the largest Arabic news aggregator on the web.