Heshaam Faili


2023

pdf
PMI-Align: Word Alignment With Point-Wise Mutual Information Without Requiring Parallel Training Data
Fatemeh Azadi | Heshaam Faili | Mohammad Javad Dousti
Findings of the Association for Computational Linguistics: ACL 2023

Word alignment has many applications including cross-lingual annotation projection, bilingual lexicon extraction, and the evaluation or analysis of translation outputs. Recent studies show that using contextualized embeddings from pre-trained multilingual language models could give us high quality word alignments without the need of parallel training data. In this work, we propose PMI-Align which computes and uses the point-wise mutual information between source and target tokens to extract word alignments, instead of the cosine similarity or dot product which is mostly used in recent approaches. Our experiments show that our proposed PMI-Align approach could outperform the rival methods on five out of six language pairs. Although our approach requires no parallel training data, we show that this method could also benefit the approaches using parallel data to fine-tune pre-trained language models on word alignments. Our code and data are publicly available.

2022

pdf
PerCQA: Persian Community Question Answering Dataset
Naghme Jamali | Yadollah Yaghoobzadeh | Heshaam Faili
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Community Question Answering (CQA) forums provide answers to many real-life questions. These forums are trendy among machine learning researchers due to their large size. Automatic answer selection, answer ranking, question retrieval, expert finding, and fact-checking are example learning tasks performed using CQA data. This paper presents PerCQA, the first Persian dataset for CQA. This dataset contains the questions and answers crawled from the most well-known Persian forum. After data acquisition, we provide rigorous annotation guidelines in an iterative process and then the annotation of question-answer pairs in SemEvalCQA format. PerCQA contains 989 questions and 21,915 annotated answers. We make PerCQA publicly available to encourage more research in Persian CQA. We also build strong benchmarks for the task of answer selection in PerCQA by using mono- and multi-lingual pre-trained language models.

2021

pdf bib
NSURL-2021 Shared Task 1: Semantic Relation Extraction in Persian
Nasrin Taghizadeh | Ali Ebrahimi | Heshaam Faili
Proceedings of the Second International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2021) co-located with ICNLSP 2021

pdf bib
PerSpellData: An Exhaustive Parallel Spell Dataset For Persian
Romina Oji | Nasrin Taghizadeh | Heshaam Faili
Proceedings of the Second International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2021) co-located with ICNLSP 2021

pdf
NLP-IIS@UT at SemEval-2021 Task 4: Machine Reading Comprehension using the Long Document Transformer
Hossein Basafa | Sajad Movahedi | Ali Ebrahimi | Azadeh Shakery | Heshaam Faili
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper presents a technical report of our submission to the 4th task of SemEval-2021, titled: Reading Comprehension of Abstract Meaning. In this task, we want to predict the correct answer based on a question given a context. Usually, contexts are very lengthy and require a large receptive field from the model. Thus, common contextualized language models like BERT miss fine representation and performance due to the limited capacity of the input tokens. To tackle this problem, we used the longformer model to better process the sequences. Furthermore, we utilized the method proposed in the longformer benchmark on wikihop dataset which improved the accuracy on our task data from (23.01% and 22.95%) achieved by the baselines for subtask 1 and 2, respectively, to (70.30% and 64.38%).

2016

pdf
Improving Word Alignment of Rare Words with Word Embeddings
Masoud Jalili Sabet | Heshaam Faili | Gholamreza Haffari
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We address the problem of inducing word alignment for language pairs by developing an unsupervised model with the capability of getting applied to other generative alignment models. We approach the task by: i)proposing a new alignment model based on the IBM alignment model 1 that uses vector representation of words, and ii)examining the use of similar source words to overcome the problem of rare source words and improving the alignments. We apply our method to English-French corpora and run the experiments with different sizes of sentence pairs. Our results show competitive performance against the baseline and in some cases improve the results up to 6.9% in terms of precision.

2015

pdf
On the Importance of Ezafe Construction in Persian Parsing
Alireza Nourian | Mohammad Sadegh Rasooli | Mohsen Imany | Heshaam Faili
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf
A Probabilistic Approach to Persian Ezafe Recognition
Habibollah Asghari | Jalal Maleki | Heshaam Faili
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

2013

pdf
Supervised Morphology Generation Using Parallel Corpus
Alireza Mahmoudi | Mohsen Arabsorkhi | Heshaam Faili
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf
Discourse-aware Statistical Machine Translation as a Context-sensitive Spell Checker
Behzad Mirzababaei | Heshaam Faili | Nava Ehsan
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf
Automatic Enhancement of LTAG Treebank
Farzaneh Zarei | Ali Basirat | Heshaam Faili | Maryam Sadat Mirian
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

2012

pdf
Collocation Extraction using Parallel Corpus
Kavosh Asadi Atui | Heshaam Faili | Kaveh Assadi Atuie
Proceedings of COLING 2012: Posters

pdf bib
Fast Unsupervised Dependency Parsing with Arc-Standard Transitions
Mohammad Sadegh Rasooli | Heshaam Faili
Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP

2011

pdf
Constructing Linguistically Motivated Structures from Statistical Grammars
Ali Basirat | Heshaam Faili
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf
Unsupervised Learning for Persian WordNet Construction
Mortaza Montazery | Heshaam Faili
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf
Automatic Persian WordNet Construction
Mortaza Montazery | Heshaam Faili
Coling 2010: Posters

2009

pdf
From Partial toward Full Parsing
Heshaam Faili
Proceedings of the International Conference RANLP-2009