Xinhao Song


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
ALIS: Aligned LLM Instruction Security Strategy for Unsafe Input Prompt
Xinhao Song | Sufeng Duan | Gongshen Liu
Proceedings of the 31st International Conference on Computational Linguistics

In large language models, existing instruction tuning methods may fail to balance the performance with robustness against attacks from user input like prompt injection and jailbreaking. Inspired by computer hardware and operating systems, we propose an instruction tuning paradigm named Aligned LLM Instruction Security Strategy (ALIS) to enhance model performance by decomposing user inputs into irreducible atomic instructions and organizing them into instruction streams which will guide the response generation of model. ALIS is a hierarchical structure, in which user inputs and system prompts are treated as user and kernel mode instructions respectively. Based on ALIS, the model can maintain security constraints by ignoring or rejecting the input instructions when user mode instructions attempt to conflict with kernel mode instructions. To build ALIS, we also develop an automatic instruction generation method for training ALIS, and give one instruction decomposition task and respective datasets. Notably, the ALIS framework with a small model to generate instruction streams still improve the resilience of LLM to attacks substantially without any lose on general capabilities.