Napat Laosaengpha
2025
Shortcut Learning in Safety: The Impact of Keyword Bias in Safeguards
Panuthep Tasawong
|
Napat Laosaengpha
|
Wuttikorn Ponwitayarat
|
Sitiporn Lim
|
Potsawee Manakul
|
Samuel Cahyawijaya
|
Can Udomcharoenchaikit
|
Peerat Limkonchotiwat
|
Ekapol Chuangsuwanich
|
Sarana Nutanong
Proceedings of the The First Workshop on LLM Security (LLMSEC)
This paper investigates the problem of shortcut learning in safety guardrails for large language models (LLMs). It reveals that current safeguard models often rely excessively on superficial cues, such as specific keywords that are spuriously correlated with training labels, rather than genuinely understanding the input’s semantics or intent. As a result, their performance degrades significantly when there is a shift in keyword distribution. The paper also examines the impact of reducing shortcut reliance, showing that merely minimizing shortcut influence is insufficient. To build robust safeguard models, it is equally crucial to promote the use of intended features.
2024
Learning Job Title Representation from Job Description Aggregation Network
Napat Laosaengpha
|
Thanit Tativannarat
|
Chawan Piansaddhayanon
|
Attapol Rutherford
|
Ekapol Chuangsuwanich
Findings of the Association for Computational Linguistics: ACL 2024
Learning job title representation is a vital process for developing automatic human resource tools. To do so, existing methods primarily rely on learning the title representation through skills extracted from the job description, neglecting the rich and diverse content within. Thus, we propose an alternative framework for learning job titles through their respective job description (JD) and utilize a Job Description Aggregator component to handle the lengthy description and bidirectional contrastive loss to account for the bidirectional relationship between the job title and its description. We evaluated the performance of our method on both in-domain and out-of-domain settings, achieving a superior performance over the skill-based approach.