Beyond Guardrails: Advanced Safety for Large Language Models — Monolingual, Multilingual and Multimodal Frontiers

Somnath Banerjee; Rima Hazra; Animesh Mukherjee

Beyond Guardrails: Advanced Safety for Large Language Models — Monolingual, Multilingual and Multimodal Frontiers

Somnath Banerjee, Rima Hazra, Animesh Mukherjee

Abstract

LLMs are now embedded in workflows that span languages, modalities, and tools. This raises safety challenges that outpace conventional “guardrails”: jailbreaks and prompt injections, attributional safety failures under code-mixing, multimodal bypass via typography and icons, activation-level manipulation, and agentic risks from tool use. This tutorial synthesizes the newest advances (2023–2025) and lays out open research questions around (i) failure modes in monolingual / multilingual / multimodal settings, (ii) training-time and inference-time defenses (rejection SFT, RLHF/RLAIF, decoding-time safety, parameter/activation steering), and (iii) evaluation and red-teaming pipelines balancing safety and utility. We anchor the tutorial with recent results including our safety related papers published at top tier conferences, and connect them to emerging best practices from recent safety tutorials. The target audience is researchers/engineers with basic NLP knowledge who want the latest techniques and a research roadmap; format is half-day with short demos and Q&A.

Anthology ID:: 2025.ijcnlp-tutorials.5
Volume:: Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: Tutorial Abstract
Month:: December
Year:: 2025
Address:: Mumbai, India
Editors:: Benjamin Heinzerling, Lun-Wei Ku
Venue:: IJCNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25–33
Language:
URL:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-tutorials.5/
DOI:
Bibkey:
Cite (ACL):: Somnath Banerjee, Rima Hazra, and Animesh Mukherjee. 2025. Beyond Guardrails: Advanced Safety for Large Language Models — Monolingual, Multilingual and Multimodal Frontiers. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: Tutorial Abstract, pages 25–33, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):: Beyond Guardrails: Advanced Safety for Large Language Models — Monolingual, Multilingual and Multimodal Frontiers (Banerjee et al., IJCNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-tutorials.5.pdf

PDF Cite Search Fix data