Ehsaneddin Asgari


2024

pdf
TuringQ: Benchmarking AI Comprehension in Theory of Computation
Pardis Sadat Zahraei | Ehsaneddin Asgari
Findings of the Association for Computational Linguistics: EMNLP 2024

We present TuringQ, the first benchmark designed to evaluate the reasoning capabilities of large language models (LLMs) in the theory of computation. TuringQ consists of 4,006 undergraduate and graduate-level question-answer pairs, categorized into four difficulty levels and covering seven core theoretical areas. We evaluate several open-source LLMs, as well as GPT-4, using Chain of Thought prompting and expert human assessment. Additionally, we propose an automated LLM-based evaluation system that demonstrates competitive accuracy when compared to human evaluation. Fine-tuning a Llama3-8B model on TuringQ shows measurable improvements in reasoning ability and out-of-domain tasks such as algebra. TuringQ serves as both a benchmark and a resource for enhancing LLM performance in complex computational reasoning tasks. Our analysis offers insights into LLM capabilities and advances in AI comprehension of theoretical computer science.

pdf
The Touché23-ValueEval Dataset for Identifying Human Values behind Arguments
Nailia Mirzakhmedova | Johannes Kiesel | Milad Alshomary | Maximilian Heinrich | Nicolas Handke | Xiaoni Cai | Valentin Barriere | Doratossadat Dastgheib | Omid Ghahroodi | MohammadAli SadraeiJavaheri | Ehsaneddin Asgari | Lea Kawaletz | Henning Wachsmuth | Benno Stein
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

While human values play a crucial role in making arguments persuasive, we currently lack the necessary extensive datasets to develop methods for analyzing the values underlying these arguments on a large scale. To address this gap, we present the Touché23-ValueEval dataset, an expansion of the Webis-ArgValues-22 dataset. We collected and annotated an additional 4780 new arguments, doubling the dataset’s size to 9324 arguments. These arguments were sourced from six diverse sources, covering religious texts, community discussions, free-text arguments, newspaper editorials, and political debates. Each argument is annotated by three crowdworkers for 54 human values, following the methodology established in the original dataset. The Touché23-ValueEval dataset was utilized in the SemEval 2023 Task 4. ValueEval: Identification of Human Values behind Arguments, where an ensemble of transformer models demonstrated state-of-the-art performance. Furthermore, our experiments show that a fine-tuned large language model, Llama-2-7B, achieves comparable results.

pdf
Transformers for Bridging Persian Dialects: Transliteration Model for Tajiki and Iranian Scripts
MohammadAli SadraeiJavaheri | Ehsaneddin Asgari | Hamid Reza Rabiee
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this study, we address the linguistic challenges posed by Tajiki Persian, a distinct variant of the Persian language that utilizes the Cyrillic script due to historical “Russification”. This distinguishes it from other Persian dialects that adopt the Arabic script. Despite its profound linguistic and cultural significance, Tajiki Persian remains a low-resource language with scant digitized datasets for computational applications. To address this deficiency, we created a parallel corpus using Shahnameh, a seminal Persian epic poem. Employing optical character recognition, we extracted Tajiki Persian verses from primary sources and applied a heuristic method to align them with their Iranian Persian counterparts. We then trained and assessed transliteration models using two prominent sequence-to-sequence architectures: GRU with attention and transformer. Our results underscore the enhanced performance of our models, particularly in contrast to pre-trained large multilingual models like GPT-3.5, emphasizing the value of dedicated datasets in advancing computational approaches for underrepresented languages. With the publication of this work, we are disseminating, for the first time, a vast collection of Persian poetry spanning 1000 years, transcribed in Tajiki scripts for the benefit of the Tajiki-speaking communities. The dataset, along with the model’s code and checkpoints, is accessible at https://github.com/language-ml/Tajiki-Shahname, marking a significant contribution to computational linguistic resources for Tajiki Persian.

pdf
AIMA at SemEval-2024 Task 3: Simple Yet Powerful Emotion Cause Pair Analysis
Alireza Ghahramani Kure | Mahshid Dehghani | Mohammad Mahdi Abootorabi | Nona Ghazizadeh | Seyed Arshan Dalili | Ehsaneddin Asgari
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

The SemEval-2024 Task 3 presents two subtasks focusing on emotion-cause pair extraction within conversational contexts. Subtask 1 revolves around the extraction of textual emotion-cause pairs, where causes are defined and annotated as textual spans within the conversation. Conversely, Subtask 2 extends the analysis to encompass multimodal cues, including language, audio, and vision, acknowledging instances where causes may not be exclusively represented in the textual data. Our proposed model for emotion-cause analysis is meticulously structured into three core segments: (i) embedding extraction, (ii) cause-pair extraction & emotion classification, and (iii) cause extraction using QA after finding pairs. Leveraging state-of-the-art techniques and fine-tuning on task-specific datasets, our model effectively unravels the intricate web of conversational dynamics and extracts subtle cues signifying causality in emotional expressions. Our team, AIMA, demonstrated strong performance in the SemEval-2024 Task 3 competition. We ranked as the 10th in subtask 1 and the 6th in subtask 2 out of 23 teams.

pdf
AIMA at SemEval-2024 Task 10: History-Based Emotion Recognition in Hindi-English Code-Mixed Conversations
Mohammad Mahdi Abootorabi | Nona Ghazizadeh | Seyed Arshan Dalili | Alireza Ghahramani Kure | Mahshid Dehghani | Ehsaneddin Asgari
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

In this study, we introduce a solution to the SemEval 2024 Task 10 on subtask 1, dedicated to Emotion Recognition in Conversation (ERC) in code-mixed Hindi-English conversations. ERC in code-mixed conversations presents unique challenges, as existing models are typically trained on monolingual datasets and may not perform well on code-mixed data. To address this, we propose a series of models that incorporate both the previous and future context of the current utterance, as well as the sequential information of the conversation. To facilitate the processing of code-mixed data, we developed a Hinglish-to-English translation pipeline to translate the code-mixed conversations into English. We designed four different base models, each utilizing powerful pre-trained encoders to extract features from the input but with varying architectures. By ensembling all of these models, we developed a final model that outperforms all other baselines.

pdf
HierarchyEverywhere at SemEval-2024 Task 4: Detection of Persuasion Techniques in Memes Using Hierarchical Text Classifier
Omid Ghahroodi | Ehsaneddin Asgari
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Text classification is an important task in natural language processing. Hierarchical Text Classification (HTC) is a subset of text classification task-type. HTC tackles multi-label classification challenges by leveraging tree structures that delineate relationships between classes, thereby striving to enhance classification accuracy through the utilization of inter-class relationships. Memes, as prevalent vehicles of modern communication within social networks, hold immense potential as instruments for propagandistic dissemination due to their profound impact on users. In SemEval-2024 Task 4, the identification of propaganda and its various forms in memes is explored through two sub-tasks: (i) utilizing only the textual component of memes, and (ii) incorporating both textual and pictorial elements. In this study, we address the proposed problem through the lens of HTC, using state-of-the-art hierarchical text classification methodologies to detect propaganda in memes. Our system achieved first place in English Sub-task 2a, underscoring its efficacy in tackling the complexities inherent in propaganda detection within the meme landscape.

2023

pdf
The Language Model, Resources, and Computational Pipelines for the Under-Resourced Iranian Azerbaijani
Marzia Nouri | Mahsa Amani | Reihaneh Zohrabi | Ehsaneddin Asgari
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
Borderless Azerbaijani Processing: Linguistic Resources and a Transformer-based Approach for Azerbaijani Transliteration
Reihaneh Zohrabi | Mostafa Masumi | Omid Ghahroodi | Parham AbedAzad | Hamid Beigy | Mohammad Hossein Rohban | Ehsaneddin Asgari
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
Ebhaam at SemEval-2023 Task 1: A CLIP-Based Approach for Comparing Cross-modality and Unimodality in Visual Word Sense Disambiguation
Zeinab Taghavi | Parsa Haghighi Naeini | Mohammad Ali Sadraei Javaheri | Soroush Gooran | Ehsaneddin Asgari | Hamid Reza Rabiee | Hossein Sameti
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper presents an approach to tackle the task of Visual Word Sense Disambiguation (Visual-WSD), which involves determining the most appropriate image to represent a given polysemous word in one of its particular senses. The proposed approach leverages the CLIP model, prompt engineering, and text-to-image models such as GLIDE and DALL-E 2 for both image retrieval and generation. To evaluate our approach, we participated in the SemEval 2023 shared task on “Visual Word Sense Disambiguation (Visual-WSD)” using a zero-shot learning setting, where we compared the accuracy of different combinations of tools, including “Simple prompt-based” methods and “Generated prompt-based” methods for prompt engineering using completion models, and text-to-image models for changing input modality from text to image. Moreover, we explored the benefits of cross-modality evaluation between text and candidate images using CLIP. Our experimental results demonstrate that the proposed approach reaches better results than cross-modality approaches, highlighting the potential of prompt engineering and text-to-image models to improve accuracy in Visual-WSD tasks. We assessed our approach in a zero-shot learning scenario and attained an accuracy of 68.75\% in our best attempt.

pdf
SUT at SemEval-2023 Task 1: Prompt Generation for Visual Word Sense Disambiguation
Omid Ghahroodi | Seyed Arshan Dalili | Sahel Mesforoush | Ehsaneddin Asgari
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

Visual Word Sense Disambiguation (V-WSD) identifies the correct visual sense of a multi-sense word in a specific context. This can be challenging as images may need to provide additional context and words may have multiple senses. A proper V-WSD system can benefit applications like image retrieval and captioning. This paper proposes a Prompt Generation approach to solve this challenge. This approach improves the robustness of language-image models like CLIP to contextual ambiguities and helps them better correlate between textual and visual contexts of different senses of words.

pdf
Sina at SemEval-2023 Task 4: A Class-Token Attention-based Model for Human Value Detection
Omid Ghahroodi | Mohammad Ali Sadraei Javaheri | Doratossadat Dastgheib | Mahdieh Soleymani Baghshah | Mohammad Hossein Rohban | Hamid Rabiee | Ehsaneddin Asgari
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

The human values expressed in argumentative texts can provide valuable insights into the culture of a society. They can be helpful in various applications such as value-based profiling and ethical analysis. However, one of the first steps in achieving this goal is to detect the category of human value from an argument accurately. This task is challenging due to the lack of data and the need for philosophical inference. It also can be challenging for humans to classify arguments according to their underlying human values. This paper elaborates on our model for the SemEval 2023 Task 4 on human value detection. We propose a class-token attention-based model and evaluate it against baseline models, including finetuned BERT language model and a keyword-based approach.

pdf
SinaAI at SemEval-2023 Task 3: A Multilingual Transformer Language Model-based Approach for the Detection of News Genre, Framing and Persuasion Techniques
Aryan Sadeghi | Reza Alipour | Kamyar Taeb | Parimehr Morassafar | Nima Salemahim | Ehsaneddin Asgari
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes SinaAI’s participation in SemEval-2023 Task 3, which involves detecting propaganda in news articles across multiple languages. The task comprises three sub-tasks: (i) genre detection, (ii) news framing,and (iii) persuasion technique identification. The employed dataset includes news articles in nine languages and domains, including English, French, Italian, German, Polish, Russian, Georgian, Greek, and Spanish, with labeled instances of news framing, genre, and persuasion techniques. Our approach combines fine-tuning multilingual language models such as XLM, LaBSE, and mBERT with data augmentation techniques. Our experimental results show that XLM outperforms other models in terms of F1-Micro in and F1-Macro, and the ensemble of XLM and LaBSE achieved the best performance. Our study highlights the effectiveness of multilingual sentence embedding models in multilingual propaganda detection. Our models achieved highest score for two languages (greek and italy) in sub-task 1 and one language (Russian) for sub-task 2.

2022

pdf
Hengam: An Adversarially Trained Transformer for Persian Temporal Tagging
Sajad Mirzababaei | Amir Hossein Kargaran | Hinrich Schütze | Ehsaneddin Asgari
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Many NLP main tasks benefit from an accurate understanding of temporal expressions, e.g., text summarization, question answering, and information retrieval. This paper introduces Hengam, an adversarially trained transformer for Persian temporal tagging outperforming state-of-the-art approaches on a diverse and manually created dataset. We create Hengam in the following concrete steps: (1) we develop HengamTagger, an extensible rule-based tool that can extract temporal expressions from a set of diverse language-specific patterns for any language of interest. (2) We apply HengamTagger to annotate temporal tags in a large and diverse Persian text collection (covering both formal and informal contexts) to be used as weakly labeled data. (3) We introduce an adversarially trained transformer model on HengamCorpus that can generalize over the HengamTagger’s rules. We create HengamGold, the first high-quality gold standard for Persian temporal tagging. Our trained adversarial HengamTransformer not only achieves the best performance in terms of the F1-score (a type F1-Score of 95.42 and a partial F1-Score of 91.60) but also successfully deals with language ambiguities and incorrect spellings. Our code, data, and models are publicly available at https://github.com/kargaranamir/Hengam.

pdf
Docalog: Multi-document Dialogue System using Transformer-based Span Retrieval
Sayed Hesam Alavian | Ali Satvaty | Sadra Sabouri | Ehsaneddin Asgari | Hossein Sameti
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering

Information-seeking dialogue systems, including knowledge identification and response generation, aim to respond to users with fluent, coherent, and informative answers based on users’ needs. This paper discusses our proposed approach, Docalog, for the DialDoc-22 (MultiDoc2Dial) shared task. Docalog identifies the most relevant knowledge in the associated document, in a multi-document setting. Docalog, is a three-stage pipeline consisting of (1) a document retriever model (DR. TEIT), (2) an answer span prediction model, and (3) an ultimate span picker deciding on the most likely answer span, out of all predicted spans. In the test phase of MultiDoc2Dial 2022, Docalog achieved f1-scores of 36.07% and 28.44% and SacreBLEU scores of 23.70% and 20.52%, respectively on the MDD-SEEN and MDD-UNSEEN folds.

pdf
Keyword-based Natural Language Premise Selection for an Automatic Mathematical Statement Proving
Doratossadat Dastgheib | Ehsaneddin Asgari
Proceedings of TextGraphs-16: Graph-based Methods for Natural Language Processing

Extraction of supportive premises for a mathematical problem can contribute to profound success in improving automatic reasoning systems. One bottleneck in automated theorem proving is the lack of a proper semantic information retrieval system for mathematical texts. In this paper, we show the effect of keyword extraction in the natural language premise selection (NLPS) shared task proposed in TextGraph-16 that seeks to select the most relevant sentences supporting a given mathematical statement.

2021

pdf
KnowMAN: Weakly Supervised Multinomial Adversarial Networks
Luisa März | Ehsaneddin Asgari | Fabienne Braune | Franziska Zimmermann | Benjamin Roth
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The absence of labeled data for training neural models is often addressed by leveraging knowledge about the specific task, resulting in heuristic but noisy labels. The knowledge is captured in labeling functions, which detect certain regularities or patterns in the training samples and annotate corresponding labels for training. This process of weakly supervised training may result in an over-reliance on the signals captured by the labeling functions and hinder models to exploit other signals or to generalize well. We propose KnowMAN, an adversarial scheme that enables to control influence of signals associated with specific labeling functions. KnowMAN forces the network to learn representations that are invariant to those signals and to pick up other signals that are more generally associated with an output label. KnowMAN strongly improves results compared to direct weakly supervised learning with a pre-trained transformer language model and a feature-based baseline.

2020

pdf
UniSent: Universal Adaptable Sentiment Lexica for 1000+ Languages
Ehsaneddin Asgari | Fabienne Braune | Benjamin Roth | Christoph Ringlstetter | Mohammad Mofrad
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we introduce UniSent universal sentiment lexica for 1000+ languages. Sentiment lexica are vital for sentiment analysis in absence of document-level annotations, a very common scenario for low-resource languages. To the best of our knowledge, UniSent is the largest sentiment resource to date in terms of the number of covered languages, including many low resource ones. In this work, we use a massively parallel Bible corpus to project sentiment information from English to other languages for sentiment analysis on Twitter data. We introduce a method called DomDrift to mitigate the huge domain mismatch between Bible and Twitter by a confidence weighting scheme that uses domain-specific embeddings to compare the nearest neighbors for a candidate sentiment word in the source (Bible) and target (Twitter) domain. We evaluate the quality of UniSent in a subset of languages for which manually created ground truth was available, Macedonian, Czech, German, Spanish, and French. We show that the quality of UniSent is comparable to manually created sentiment resources when it is used as the sentiment seed for the task of word sentiment prediction on top of embedding representations. In addition, we show that emoticon sentiments could be reliably predicted in the Twitter domain using only UniSent and monolingual embeddings in German, Spanish, French, and Italian. With the publication of this paper, we release the UniSent sentiment lexica at http://language-lab.info/unisent.

pdf
EmbLexChange at SemEval-2020 Task 1: Unsupervised Embedding-based Detection of Lexical Semantic Changes
Ehsaneddin Asgari | Christoph Ringlstetter | Hinrich Schütze
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes EmbLexChange, a system introduced by the “Life-Language” team for SemEval-2020 Task 1, on unsupervised detection of lexical-semantic changes. EmbLexChange is defined as the divergence between the embedding based profiles of word w (calculated with respect to a set of reference words) in the source and the target domains (source and target domains can be simply two time frames t_1 and t_2). The underlying assumption is that the lexical-semantic change of word w would affect its co-occurring words and subsequently alters the neighborhoods in the embedding spaces. We show that using a resampling framework for the selection of reference words (with conserved senses), we can more reliably detect lexical-semantic changes in English, German, Swedish, and Latin. EmbLexChange achieved second place in the binary detection of semantic changes in the SemEval-2020.

2017

pdf
Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages
Ehsaneddin Asgari | Hinrich Schütze
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present SuperPivot, an analysis method for low-resource languages that occur in a superparallel corpus, i.e., in a corpus that contains an order of magnitude more languages than parallel corpora currently in use. We show that SuperPivot performs well for the crosslingual analysis of the linguistic phenomenon of tense. We produce analysis results for more than 1000 languages, conducting – to the best of our knowledge – the largest crosslingual computational study performed to date. We extend existing methodology for leveraging parallel corpora for typological analysis by overcoming a limiting assumption of earlier work: We only require that a linguistic feature is overtly marked in a few of thousands of languages as opposed to requiring that it be marked in all languages under investigation.

2016

pdf
Text Analysis and Automatic Triage of Posts in a Mental Health Forum
Ehsaneddin Asgari | Soroush Nasiriany | Mohammad R.K. Mofrad
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology

pdf
Comparing Fifty Natural Languages and Twelve Genetic Languages Using Word Embedding Language Divergence (WELD) as a Quantitative Measure of Language Distance
Ehsaneddin Asgari | Mohammad R.K. Mofrad
Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP

2013

pdf
Linguistic Resources and Topic Models for the Analysis of Persian Poems
Ehsaneddin Asgari | Jean-Cédric Chappelier
Proceedings of the Workshop on Computational Linguistics for Literature