Abdul Khan


Probabilistic Robustness for Data Filtering
Yu Yu | Abdul Khan | Shahram Khadivi | Jia Xu
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

We introduce our probabilistic robustness rewarded data optimization (PRoDO) approach as a framework to enhance the model’s generalization power by selecting training data that optimizes our probabilistic robustness metrics. We use proximal policy optimization (PPO) reinforcement learning to approximately solve the computationally intractable training subset selection problem. The PPO’s reward is defined as our (${alpha,{epsilon, {gamma$)-Robustness that measures performance consistency over multiple domains by simulating unknown test sets in real-world scenarios using a leaving-one-out strategy. We demonstrate that our PRoDO effectively filters data that lead to significantly higher prediction accuracy and robustness on unknown-domain test sets. Our experiments achieve up to +17.2{% increase of accuracy (+25.5{% relatively) in sentiment analysis, and -28.05 decrease of perplexity (-32.1{% relatively) in language modeling.In addition, our probabilistic (${alpha,{epsilon, {gamma$)-Robustness definition serves as an evaluation metric with higher levels of agreement with human annotations than typical performance-based metrics.


Analyzing Encoded Concepts in Transformer Language Models
Hassan Sajjad | Nadir Durrani | Fahim Dalvi | Firoj Alam | Abdul Khan | Jia Xu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We propose a novel framework ConceptX, to analyze how latent concepts are encoded in representations learned within pre-trained lan-guage models. It uses clustering to discover the encoded concepts and explains them by aligning with a large set of human-defined concepts. Our analysis on seven transformer language models reveal interesting insights: i) the latent space within the learned representations overlap with different linguistic concepts to a varying degree, ii) the lower layers in the model are dominated by lexical concepts (e.g., affixation) and linguistic ontologies (e.g. Word-Net), whereas the core-linguistic concepts (e.g., morphology, syntactic relations) are better represented in the middle and higher layers, iii) some encoded concepts are multi-faceted and cannot be adequately explained using the existing human-defined concepts.

SIT at MixMT 2022: Fluent Translation Built on Giant Pre-trained Models
Abdul Khan | Hrishikesh Kanade | Girish Budhrani | Preet Jhanglani | Jia Xu
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper describes the Stevens Institute of Technology’s submission for the WMT 2022 Shared Task: Code-mixed Machine Translation (MixMT). The task consisted of two subtasks, subtask 1 Hindi/English to Hinglish and subtask 2 Hinglish to English translation. Our findings lie in the improvements made through the use of large pre-trained multilingual NMT models and in-domain datasets, as well as back-translation and ensemble techniques. The translation output is automatically evaluated against the reference translations using ROUGE-L and WER. Our system achieves the 1st position on subtask 2 according to ROUGE-L, WER, and human evaluation, 1st position on subtask 1 according to WER and human evaluation, and 3rd position on subtask 1 with respect to ROUGE-L metric.


Hunter NMT System for WMT18 Biomedical Translation Task: Transfer Learning in Neural Machine Translation
Abdul Khan | Subhadarshi Panda | Jia Xu | Lampros Flokas
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the submission of Hunter Neural Machine Translation (NMT) to the WMT’18 Biomedical translation task from English to French. The discrepancy between training and test data distribution brings a challenge to translate text in new domains. Beyond the previous work of combining in-domain with out-of-domain models, we found accuracy and efficiency gain in combining different in-domain models. We conduct extensive experiments on NMT with transfer learning. We train on different in-domain Biomedical datasets one after another. That means parameters of the previous training serve as the initialization of the next one. Together with a pre-trained out-of-domain News model, we enhanced translation quality with 3.73 BLEU points over the baseline. Furthermore, we applied ensemble learning on training models of intermediate epochs and achieved an improvement of 4.02 BLEU points over the baseline. Overall, our system is 11.29 BLEU points above the best system of last year on the EDP 2017 test set.