Utsav Maskey

2025

pdf bib abs
Benchmarking Large Language Models for Cryptanalysis and Side-Channel Vulnerabilities
Utsav Maskey | Chencheng Zhu | Usman Naseem
Findings of the Association for Computational Linguistics: EMNLP 2025

Recent advancements in Large Language Models (LLMs) have transformed natural language understanding and generation, leading to extensive benchmarking across diverse tasks. However, cryptanalysis—a critical area for data security and its connection to LLMs’ generalization abilities remains underexplored in LLM evaluations. To address this gap, we evaluate the cryptanalytic potential of state‐of‐the‐art LLMs on ciphertexts produced by a range of cryptographic algorithms. We introduce a benchmark dataset of diverse plaintexts—spanning multiple domains, lengths, writing styles, and topics—paired with their encrypted versions. Using zero‐shot and few‐shot settings along with chain‐of‐thought prompting, we assess LLMs’ decryption success rate and discuss their comprehension abilities. Our findings reveal key insights into LLMs’ strengths and limitations in side‐channel scenarios and raise concerns about their susceptibility to under-generalization related attacks. This research highlights the dual‐use nature of LLMs in security contexts and contributes to the ongoing discussion on AI safety and security.

2022

pdf bib abs
Nepali Encoder Transformers: An Analysis of Auto Encoding Transformer Language Models for Nepali Text Classification
Utsav Maskey | Manish Bhatta | Shiva Bhatt | Sanket Dhungel | Bal Krishna Bal
Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages

Language model pre-training has significantly impacted NLP and resulted in performance gains on many NLP-related tasks, but comparative study of different approaches on many low-resource languages seems to be missing. This paper attempts to investigate appropriate methods for pretraining a Transformer-based model for the Nepali language. We focus on the language-specific aspects that need to be considered for modeling. Although some language models have been trained for Nepali, the study is far from sufficient. We train three distinct Transformer-based masked language models for Nepali text sequences: distilbert-base (Sanh et al., 2019) for its efficiency and minuteness, deberta-base (P. He et al., 2020) for its capability of modeling the dependency of nearby token pairs and XLM-ROBERTa (Conneau et al., 2020) for its capabilities to handle multilingual downstream tasks. We evaluate and compare these models with other Transformer-based models on a downstream classification task with an aim to suggest an effective strategy for training low-resource language models and their fine-tuning.

Co-authors

Chencheng Zhu 1

Venues

findings1
sigul1

Fix author