Fatima Abu Salem


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2020

pdf bib
Time-Aware Word Embeddings for Three Lebanese News Archives
Jad Doughman | Fatima Abu Salem | Shady Elbassuoni
Proceedings of the Twelfth Language Resources and Evaluation Conference

Word embeddings have proven to be an effective method for capturing semantic relations among distinct terms within a large corpus. In this paper, we present a set of word embeddings learnt from three large Lebanese news archives, which collectively consist of 609,386 scanned newspaper images and spanning a total of 151 years, ranging from 1933 till 2011. The diversified ideological nature of the news archives alongside the temporal variability of the embeddings offer a rare glimpse onto the variation of word representation across the left-right political spectrum. To train the word embeddings, Google’s Tesseract 4.0 OCR engine was employed to transcribe the scanned news archives, and various archive-level as well as decade-level word embeddings were learnt. To evaluate the accuracy of the learnt word embeddings, a benchmark of analogy tasks was used. Finally, we demonstrate an interactive system that allows the end user to visualize for a given word of interest, the variation of the top-k closest words in the embedding space as a function of time and across news archives using an animated scatter plot.