Partha Pakray


2021

pdf bib
Improved English to Hindi Multimodal Neural Machine Translation
Sahinur Rahman Laskar | Abdullah Faiz Ur Rahman Khilji | Darsh Kaushik | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

Machine translation performs automatic translation from one natural language to another. Neural machine translation attains a state-of-the-art approach in machine translation, but it requires adequate training data, which is a severe problem for low-resource language pairs translation. The concept of multimodal is introduced in neural machine translation (NMT) by merging textual features with visual features to improve low-resource pair translation. WAT2021 (Workshop on Asian Translation 2021) organizes a shared task of multimodal translation for English to Hindi. We have participated the same with team name CNLP-NITS-PP in two submissions: multimodal and text-only NMT. This work investigates phrase pairs injection via data augmentation approach and attains improvement over our previous work at WAT2020 on the same task in both text-only and multimodal NMT. We have achieved second rank on the challenge test set for English to Hindi multimodal translation where Bilingual Evaluation Understudy (BLEU) score of 39.28, Rank-based Intuitive Bilingual Evaluation Score (RIBES) 0.792097, and Adequacy-Fluency Metrics (AMFM) score 0.830230 respectively.

pdf bib
EnKhCorp1.0: An English–Khasi Corpus
Sahinur Rahman Laskar | Abdullah Faiz Ur Rahman Khilji Darsh Kaushik | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021)

In machine translation, corpus preparation is one of the crucial tasks, particularly for lowresource pairs. In multilingual countries like India, machine translation plays a vital role in communication among people with various linguistic backgrounds. There are available online automatic translation systems by Google and Microsoft which include various languages which lack support for the Khasi language, which can hence be considered lowresource. This paper overviews the development of EnKhCorp1.0, a corpus for English–Khasi pair, and implemented baseline systems for EnglishtoKhasi and KhasitoEnglish translation based on the neural machine translation approach.

pdf bib
Neural Machine Translation for Tamil–Telugu Pair
Sahinur Rahman Laskar | Bishwaraj Paul | Prottay Kumar Adhikary | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the Sixth Conference on Machine Translation

The neural machine translation approach has gained popularity in machine translation because of its context analysing ability and its handling of long-term dependency issues. We have participated in the WMT21 shared task of similar language translation on a Tamil-Telugu pair with the team name: CNLP-NITS. In this task, we utilized monolingual data via pre-train word embeddings in transformer model based neural machine translation to tackle the limitation of parallel corpus. Our model has achieved a bilingual evaluation understudy (BLEU) score of 4.05, rank-based intuitive bilingual evaluation score (RIBES) score of 24.80 and translation edit rate (TER) score of 97.24 for both Tamil-to-Telugu and Telugu-to-Tamil translations respectively.

pdf bib
CNLP-NITS @ LongSumm 2021: TextRank Variant for Generating Long Summaries
Darsh Kaushik | Abdullah Faiz Ur Rahman Khilji | Utkarsh Sinha | Partha Pakray
Proceedings of the Second Workshop on Scholarly Document Processing

The huge influx of published papers in the field of machine learning makes the task of summarization of scholarly documents vital, not just to eliminate the redundancy but also to provide a complete and satisfying crux of the content. We participated in LongSumm 2021: The 2nd Shared Task on Generating Long Summaries for scientific documents, where the task is to generate long summaries for scientific papers provided by the organizers. This paper discusses our extractive summarization approach to solve the task. We used TextRank algorithm with the BM25 score as a similarity function. Even after being a graph-based ranking algorithm that does not require any learning, TextRank produced pretty decent results with minimal compute power and time. We attained 3rd rank according to ROUGE-1 scores (0.5131 for F-measure and 0.5271 for recall) and performed decently as shown by the ROUGE-2 scores.

2020

pdf bib
Hindi-Marathi Cross Lingual Model
Sahinur Rahman Laskar | Abdullah Faiz Ur Rahman Khilji | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the Fifth Conference on Machine Translation

Machine Translation (MT) is a vital tool for aiding communication between linguistically separate groups of people. The neural machine translation (NMT) based approaches have gained widespread acceptance because of its outstanding performance. We have participated in WMT20 shared task of similar language translation on Hindi-Marathi pair. The main challenge of this task is by utilization of monolingual data and similarity features of similar language pair to overcome the limitation of available parallel data. In this work, we have implemented NMT based model that simultaneously learns bilingual embedding from both the source and target language pairs. Our model has achieved Hindi to Marathi bilingual evaluation understudy (BLEU) score of 11.59, rank-based intuitive bilingual evaluation score (RIBES) score of 57.76 and translation edit rate (TER) score of 79.07 and Marathi to Hindi BLEU score of 15.44, RIBES score of 61.13 and TER score of 75.96.

pdf bib
Multimodal Neural Machine Translation for English to Hindi
Sahinur Rahman Laskar | Abdullah Faiz Ur Rahman Khilji | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 7th Workshop on Asian Translation

Machine translation (MT) focuses on the automatic translation of text from one natural language to another natural language. Neural machine translation (NMT) achieves state-of-the-art results in the task of machine translation because of utilizing advanced deep learning techniques and handles issues like long-term dependency, and context-analysis. Nevertheless, NMT still suffers low translation quality for low resource languages. To encounter this challenge, the multi-modal concept comes in. The multi-modal concept combines textual and visual features to improve the translation quality of low resource languages. Moreover, the utilization of monolingual data in the pre-training step can improve the performance of the system for low resource language translations. Workshop on Asian Translation 2020 (WAT2020) organized a translation task for multimodal translation in English to Hindi. We have participated in the same in two-track submission, namely text-only and multi-modal translation with team name CNLP-NITS. The evaluated results are declared at the WAT2020 translation task, which reports that our multi-modal NMT system attained higher scores than our text-only NMT on both challenge and evaluation test set. For the challenge test data, our multi-modal neural machine translation system achieves Bilingual Evaluation Understudy (BLEU) score of 33.57, Rank-based Intuitive Bilingual Evaluation Score (RIBES) 0.754141, Adequacy-Fluency Metrics (AMFM) score 0.787320 and for evaluation test data, BLEU, RIBES, and, AMFM score of 40.51, 0.803208, and 0.820980 for English to Hindi translation respectively.

pdf bib
Zero-Shot Neural Machine Translation: Russian-Hindi @LoResMT 2020
Sahinur Rahman Laskar | Abdullah Faiz Ur Rahman Khilji | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages

Neural machine translation (NMT) is a widely accepted approach in the machine translation (MT) community, translating from one natural language to another natural language. Although, NMT shows remarkable performance in both high and low resource languages, it needs sufficient training corpus. The availability of a parallel corpus in low resource language pairs is one of the challenging tasks in MT. To mitigate this issue, NMT attempts to utilize a monolingual corpus to get better at translation for low resource language pairs. Workshop on Technologies for MT of Low Resource Languages (LoResMT 2020) organized shared tasks of low resource language pair translation using zero-shot NMT. Here, the parallel corpus is not used and only monolingual corpora is allowed. We have participated in the same shared task with our team name CNLP-NITS for the Russian-Hindi language pair. We have used masked sequence to sequence pre-training for language generation (MASS) with only monolingual corpus following the unsupervised NMT architecture. The evaluated results are declared at the LoResMT 2020 shared task, which reports that our system achieves the bilingual evaluation understudy (BLEU) score of 0.59, precision score of 3.43, recall score of 5.48, F-measure score of 4.22, and rank-based intuitive bilingual evaluation score (RIBES) of 0.180147 in Russian to Hindi translation. And for Hindi to Russian translation, we have achieved BLEU, precision, recall, F-measure, and RIBES score of 1.11, 4.72, 4.41, 4.56, and 0.026842 respectively.

pdf bib
EnAsCorp1.0: English-Assamese Corpus
Sahinur Rahman Laskar | Abdullah Faiz Ur Rahman Khilji | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages

The corpus preparation is one of the important challenging task for the domain of machine translation especially in low resource language scenarios. Country like India where multiple languages exists, machine translation attempts to minimize the communication gap among people with different linguistic backgrounds. Although Google Translation covers automatic translation of various languages all over the world but it lags in some languages including Assamese. In this paper, we have developed EnAsCorp1.0, corpus of English-Assamese low resource pair where parallel and monolingual data are collected from various online sources. We have also implemented baseline systems with statistical machine translation and neural machine translation approaches for the same corpus.

2019

pdf bib
Neural Machine Translation: Hindi-Nepali
Sahinur Rahman Laskar | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

With the extensive use of Machine Translation (MT) technology, there is progressively interest in directly translating between pairs of similar languages. Because the main challenge is to overcome the limitation of available parallel data to produce a precise MT output. Current work relies on the Neural Machine Translation (NMT) with attention mechanism for the similar language translation of WMT19 shared task in the context of Hindi-Nepali pair. The NMT systems trained the Hindi-Nepali parallel corpus and tested, analyzed in Hindi ⇔ Nepali translation. The official result declared at WMT19 shared task, which shows that our NMT system obtained Bilingual Evaluation Understudy (BLEU) score 24.6 for primary configuration in Nepali to Hindi translation. Also, we have achieved BLEU score 53.7 (Hindi to Nepali) and 49.1 (Nepali to Hindi) in contrastive system type.

pdf bib
English to Hindi Multi-modal Neural Machine Translation and Hindi Image Captioning
Sahinur Rahman Laskar | Rohit Pratap Singh | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 6th Workshop on Asian Translation

With the widespread use of Machine Trans-lation (MT) techniques, attempt to minimizecommunication gap among people from di-verse linguistic backgrounds. We have par-ticipated in Workshop on Asian Transla-tion 2019 (WAT2019) multi-modal translationtask. There are three types of submissiontrack namely, multi-modal translation, Hindi-only image captioning and text-only transla-tion for English to Hindi translation. The mainchallenge is to provide a precise MT output.The multi-modal concept incorporates textualand visual features in the translation task. Inthis work, multi-modal translation track re-lies on pre-trained convolutional neural net-works (CNN) with Visual Geometry Grouphaving 19 layered (VGG19) to extract imagefeatures and attention-based Neural MachineTranslation (NMT) system for translation.The merge-model of recurrent neural network(RNN) and CNN is used for the Hindi-onlyimage captioning. The text-only translationtrack is based on the transformer model of theNMT system. The official results evaluated atWAT2019 translation task, which shows thatour multi-modal NMT system achieved Bilin-gual Evaluation Understudy (BLEU) score20.37, Rank-based Intuitive Bilingual Eval-uation Score (RIBES) 0.642838, Adequacy-Fluency Metrics (AMFM) score 0.668260 forchallenge test data and BLEU score 40.55,RIBES 0.760080, AMFM score 0.770860 forevaluation test data in English to Hindi multi-modal translation respectively.

2017

pdf bib
NITMZ-JU at IJCNLP-2017 Task 4: Customer Feedback Analysis
Somnath Banerjee | Partha Pakray | Riyanka Manna | Dipankar Das | Alexander Gelbukh
Proceedings of the IJCNLP 2017, Shared Tasks

In this paper, we describe a deep learning framework for analyzing the customer feedback as part of our participation in the shared task on Customer Feedback Analysis at the 8th International Joint Conference on Natural Language Processing (IJCNLP 2017). A Convolutional Neural Network (CNN) based deep neural network model was employed for the customer feedback task. The proposed system was evaluated on two languages, namely, English and French.

pdf bib
JU NITM at IJCNLP-2017 Task 5: A Classification Approach for Answer Selection in Multi-choice Question Answering System
Sandip Sarkar | Dipankar Das | Partha Pakray
Proceedings of the IJCNLP 2017, Shared Tasks

This paper describes the participation of the JU NITM team in IJCNLP-2017 Task 5: “Multi-choice Question Answering in Examinations”. The main aim of this shared task is to choose the correct option for each multi-choice question. Our proposed model includes vector representations as feature and machine learning for classification. At first we represent question and answer in vector space and after that find the cosine similarity between those two vectors. Finally we apply classification approach to find the correct answer. Our system was only developed for the English language, and it obtained an accuracy of 40.07% for test dataset and 40.06% for valid dataset.

2016

pdf bib
JUNITMZ at SemEval-2016 Task 1: Identifying Semantic Similarity Using Levenshtein Ratio
Sandip Sarkar | Dipankar Das | Partha Pakray | Alexander Gelbukh
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2014

pdf bib
Automatic Building and Using Parallel Resources for SMT from Comparable Corpora
Santanu Pal | Partha Pakray | Sudip Kumar Naskar
Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra)

pdf bib
NTNU: Measuring Semantic Similarity with Sublexical Feature Representations and Soft Cardinality
André Lynum | Partha Pakray | Björn Gambäck | Sergio Jimenez
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2013

pdf bib
Automatic Evaluation of Summary Using Textual Entailment
Pinaki Bhaskar | Partha Pakray
Proceedings of the Student Research Workshop associated with RANLP 2013

2012

pdf bib
JU_CSE_NLP: Multi-grade Classification of Semantic Similarity between Text Pairs
Snehasis Neogi | Partha Pakray | Sivaji Bandyopadhyay | Alexander Gelbukh
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
JU_CSE_NLP: Language Independent Cross-lingual Textual Entailment System
Snehasis Neogi | Partha Pakray | Sivaji Bandyopadhyay | Alexander Gelbukh
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2010

pdf bib
JU: A Supervised Approach to Identify Semantic Relations from Paired Nominals
Santanu Pal | Partha Pakray | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 5th International Workshop on Semantic Evaluation