Arda Akdemir
2026
RAPIDS: Resume Attack Prompt Injection Detection at Scale
Yohann Augey | Joshua H. Levy | Arda Akdemir
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Yohann Augey | Joshua H. Levy | Arda Akdemir
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
The integration of Large Language Models (LLMs) into recruitment workflows has introduced a critical security vulnerability: indirect prompt injection attacks embedded within resumes can manipulate screening tools to override instructions, effectively jailbreaking the hiring process. Frontier LLMs can detect such anomalies, but deploying them at the scale required for high-volume recruitment is prohibitively slow and costly. At the same time, existing generic prompt injection detectors lack the domain specificity needed for nuanced resume attacks. To address this gap, we introduce RAPIDS, a scalable detection framework with three contributions. First, we release a synthetically generated dataset of injection snippets derived from curated attack seeds spanning multiple adversarial strategies to address data scarcity in this domain. Second, we fine-tune a lightweight Small Language Model (SLM) on this data that outperforms the best off-the-shelf detector by over 50% in relative F1 and approaches frontier LLM accuracy. Third, we propose a cascade architecture in which the fine-tuned SLM serves as a high-recall first stage followed by an LLM verifier. This design achieves ≥ 98% end-to-end recall on both evaluated datasets while delivering a 21-24× latency reduction over standalone frontier LLMs (GPT-5-mini), bringing expected per-request latency to 115-171 ms at roughly 3.5% of the API cost.
2022
Developing Language Resources and NLP Tools for the North Korean Language
Arda Akdemir | Yeojoo Jeon | Tetsuo Shibuya
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Arda Akdemir | Yeojoo Jeon | Tetsuo Shibuya
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Since the division of Korea, the two Korean languages have diverged significantly over the last 70 years. However, due to the lack of linguistic source of the North Korean language, there is no DPRK-based language model. Consequently, scholars rely on the Korean language model by utilizing South Korean linguistic data. In this paper, we first present a large-scale dataset for the North Korean language. We use the dataset to train a BERT-based language model, DPRK-BERT. Second, we annotate a subset of this dataset for the sentiment analysis task. Finally, we compare the performance of different language models for masked language modeling and sentiment analysis tasks.
2020
Research on Task Discovery for Transfer Learning in Deep Neural Networks
Arda Akdemir
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Arda Akdemir
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Deep neural network based machine learning models are shown to perform poorly on unseen or out-of-domain examples by numerous recent studies. Transfer learning aims to avoid overfitting and to improve generalizability by leveraging the information obtained from multiple tasks. Yet, the benefits of transfer learning depend largely on task selection and finding the right method of sharing. In this thesis, we hypothesize that current deep neural network based transfer learning models do not achieve their fullest potential for various tasks and there are still many task combinations that will benefit from transfer learning that are not considered by the current models. To this end, we started our research by implementing a novel multi-task learner with relaxed annotated data requirements and obtained a performance improvement on two NLP tasks. We will further devise models to tackle tasks from multiple areas of machine learning, such as Bioinformatics and Computer Vision, in addition to NLP.