Mehwish Alam


2022

pdf
TF-IDF or Transformers for Arabic Dialect Identification? ITFLOWS participation in the NADI 2022 Shared Task
Fouad Shammary | Yiyi Chen | Zsolt T Kardkovacs | Mehwish Alam | Haithem Afli
Proceedings of the The Seventh Arabic Natural Language Processing Workshop (WANLP)

This study targets the shared task of Nuanced Arabic Dialect Identification (NADI) organized with the Workshop on Arabic Natural Language Processing (WANLP). It further focuses on Subtask 1 on the identification of the Arabic dialects at the country level. More specifically, it studies the impact of a traditional approach such as TF-IDF and then moves on to study the impact of advanced deep learning based methods. These methods include fully fine-tuning MARBERT as well as adapter based fine-tuning of MARBERT with and without performing data augmentation. The evaluation shows that the traditional approach based on TF-IDF scores the best in terms of accuracy on TEST-A dataset, while, the fine-tuned MARBERT with adapter on augmented data scores the second on Macro F1-score on the TEST-B dataset. This led to the proposed system being ranked second on the shared task on average.

pdf bib
Proceedings of the LREC 2022 workshop on Natural Language Processing for Political Sciences
Haithem Afli | Mehwish Alam | Houda Bouamor | Cristina Blasi Casagran | Colleen Boland | Sahar Ghannay
Proceedings of the LREC 2022 workshop on Natural Language Processing for Political Sciences

pdf
On the Impact of Temporal Representations on Metaphor Detection
Giorgio Ottolina | Matteo Luigi Palmonari | Manuel Vimercati | Mehwish Alam
Proceedings of the Thirteenth Language Resources and Evaluation Conference

State-of-the-art approaches for metaphor detection compare their literal - or core - meaning and their contextual meaning using metaphor classifiers based on neural networks. However, metaphorical expressions evolve over time due to various reasons, such as cultural and societal impact. Metaphorical expressions are known to co-evolve with language and literal word meanings, and even drive, to some extent, this evolution. This poses the question of whether different, possibly time-specific, representations of literal meanings may impact the metaphor detection task. To the best of our knowledge, this is the first study that examines the metaphor detection task with a detailed exploratory analysis where different temporal and static word embeddings are used to account for different representations of literal meanings. Our experimental analysis is based on three popular benchmarks used for metaphor detection and word embeddings extracted from different corpora and temporally aligned using different state-of-the-art approaches. The results suggest that the usage of different static word embedding methods does impact the metaphor detection task and some temporal word embeddings slightly outperform static methods. However, the results also suggest that temporal word embeddings may provide representations of the core meaning of the metaphor even too close to their contextual meaning, thus confusing the classifier. Overall, the interaction between temporal language evolution and metaphor detection appears tiny in the benchmark datasets used in our experiments. This suggests that future work for the computational analysis of this important linguistic phenomenon should first start by creating a new dataset where this interaction is better represented.

2010

pdf
PDTB XML: the XMLization of the Penn Discourse TreeBank 2.0
Xuchen Yao | Irina Borisova | Mehwish Alam
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The current study presents a conversion and unification of the Penn Discourse TreeBank 2.0 (PDTB) and the Penn TreeBank (PTB) under XML format. The main goal of the PDTB XML is to create a tool for efficient and broad querying of the syntax and discourse information simultaneously. The key stages of the project are developing proper cross-references between different data types and their representation in the modified TIGER-XML format, and then writing the required declarative languages (XML Schema). PTB XML is compatible with TIGER-XML format. The PDTB XML is developed as a unified format for the convenience of XQuery users; it integrates discourse relations and XML structures into one unified hierarchy and builds the cross references between the syntactic trees and the discourse relations. The syntactic and discourse elements are assigned with unique IDs in order to build cross-references between them. The converted corpus allows for a simultaneous search for syntactically specified discourse information based on the XQuery standard, which is illustrated with a simple example in the article.