Heshaam Faili

2023

pdf abs
PMI-Align: Word Alignment With Point-Wise Mutual Information Without Requiring Parallel Training Data
Fatemeh Azadi | Heshaam Faili | Mohammad Javad Dousti
Findings of the Association for Computational Linguistics: ACL 2023

Word alignment has many applications including cross-lingual annotation projection, bilingual lexicon extraction, and the evaluation or analysis of translation outputs. Recent studies show that using contextualized embeddings from pre-trained multilingual language models could give us high quality word alignments without the need of parallel training data. In this work, we propose PMI-Align which computes and uses the point-wise mutual information between source and target tokens to extract word alignments, instead of the cosine similarity or dot product which is mostly used in recent approaches. Our experiments show that our proposed PMI-Align approach could outperform the rival methods on five out of six language pairs. Although our approach requires no parallel training data, we show that this method could also benefit the approaches using parallel data to fine-tune pre-trained language models on word alignments. Our code and data are publicly available.

2022

pdf abs
PerCQA: Persian Community Question Answering Dataset
Naghme Jamali | Yadollah Yaghoobzadeh | Heshaam Faili
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Community Question Answering (CQA) forums provide answers to many real-life questions. These forums are trendy among machine learning researchers due to their large size. Automatic answer selection, answer ranking, question retrieval, expert finding, and fact-checking are example learning tasks performed using CQA data. This paper presents PerCQA, the first Persian dataset for CQA. This dataset contains the questions and answers crawled from the most well-known Persian forum. After data acquisition, we provide rigorous annotation guidelines in an iterative process and then the annotation of question-answer pairs in SemEvalCQA format. PerCQA contains 989 questions and 21,915 annotated answers. We make PerCQA publicly available to encourage more research in Persian CQA. We also build strong benchmarks for the task of answer selection in PerCQA by using mono- and multi-lingual pre-trained language models.

2021

pdf bib
NSURL-2021 Shared Task 1: Semantic Relation Extraction in Persian
Nasrin Taghizadeh | Ali Ebrahimi | Heshaam Faili
Proceedings of the Second International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2021) co-located with ICNLSP 2021

pdf bib
PerSpellData: An Exhaustive Parallel Spell Dataset For Persian
Romina Oji | Nasrin Taghizadeh | Heshaam Faili
Proceedings of the Second International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2021) co-located with ICNLSP 2021

pdf abs
NLP-IIS@UT at SemEval-2021 Task 4: Machine Reading Comprehension using the Long Document Transformer
Hossein Basafa | Sajad Movahedi | Ali Ebrahimi | Azadeh Shakery | Heshaam Faili
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper presents a technical report of our submission to the 4th task of SemEval-2021, titled: Reading Comprehension of Abstract Meaning. In this task, we want to predict the correct answer based on a question given a context. Usually, contexts are very lengthy and require a large receptive field from the model. Thus, common contextualized language models like BERT miss fine representation and performance due to the limited capacity of the input tokens. To tackle this problem, we used the longformer model to better process the sequences. Furthermore, we utilized the method proposed in the longformer benchmark on wikihop dataset which improved the accuracy on our task data from (23.01% and 22.95%) achieved by the baselines for subtask 1 and 2, respectively, to (70.30% and 64.38%).

2016

pdf abs
Improving Word Alignment of Rare Words with Word Embeddings
Masoud Jalili Sabet | Heshaam Faili | Gholamreza Haffari
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We address the problem of inducing word alignment for language pairs by developing an unsupervised model with the capability of getting applied to other generative alignment models. We approach the task by: i)proposing a new alignment model based on the IBM alignment model 1 that uses vector representation of words, and ii)examining the use of similar source words to overcome the problem of rare source words and improving the alignments. We apply our method to English-French corpora and run the experiments with different sizes of sentence pairs. Our results show competitive performance against the baseline and in some cases improve the results up to 6.9% in terms of precision.

Heshaam Faili

2023

2022

2021

2016

2015

2014

2013

2012

2011

2010

2009

Co-authors

Venues