Ching-Yun Ko
2026
ImReasoner: Improving Memory-based Language Models for Reasoning-in-a-Haystack Tasks
Ching-Yun Ko | Payel Das | Sihui Dai | Georgios Kollias | Subhajit Chaudhury | Aurelie C. Lozano | Pin-Yu Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ching-Yun Ko | Payel Das | Sihui Dai | Georgios Kollias | Subhajit Chaudhury | Aurelie C. Lozano | Pin-Yu Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reasoning over long contexts remains a major challenge for language models, particularly when solving tasks that require integrating multiple facts in sequence or generalizing to new distributions. We argue that this difficulty stems from a lack of structural inductive bias. Recently, alternative frameworks have been proposed to explicitly encode contexts as ordered memory and perform iterative retrieval to construct reasoning chains. Despite the promising results shown in prior arts, they are still heavily reliant on intermediate chain supervision and fall short in showing emergent reasoning generalization in the presence of hard distractions in reasoning-in-a-haystack tasks. Furthermore, we discover that as the amount of distractions increases, traditional episodic memory reads suffer from ill-conditioning problems, which lead to inaccurate context retrievals. In this work, we formalize the motivation for necessary inductive bias in reasoning-in-a-Haystack tasks, propose inference-time memory update procedures mimicking the "identify and remove unnecessary and unrelated details" in *constructively responsive reading*, introduce staged training inspired by human conceptual understanding, and finally demonstrate the possibilities and limits of such framework in the weakly supervised scenario.
AI Steerability 360: A Toolkit for Steering Large Language Models
Erik Miehling | Karthikeyan Natesan Ramamurthy | Praveen Venkateswaran | Ching-Yun Ko | Pierre Dognin | Moninder Singh | Tejaswini Pedapati | Avinash Balakrishnan | Matthew Riemer | Dennis Wei | Inge Vejsbjerg | Elizabeth M. Daly | Kush R. Varshney
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Erik Miehling | Karthikeyan Natesan Ramamurthy | Praveen Venkateswaran | Ching-Yun Ko | Pierre Dognin | Moninder Singh | Tejaswini Pedapati | Avinash Balakrishnan | Matthew Riemer | Dennis Wei | Inge Vejsbjerg | Elizabeth M. Daly | Kush R. Varshney
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
The AI Steerability 360 toolkit is an extensible, open-source Python library for steering LLMs. Steering abstractions are designed around four model control surfaces: input (modification of the prompt), structural (modification of the model’s weights or architecture), state (modification of the model’s activations and attentions), and output (modification of the decoding or generation process). Steering methods exert control on the model through a common interface, termed a steering pipeline, which additionally allows for the composition of multiple steering methods. Comprehensive evaluation and comparison of steering methods/pipelines is facilitated by use case classes (for defining tasks) and a benchmark class (for performance comparison on a given task). The functionality provided by the toolkit significantly lowers the barrier to developing and comprehensively evaluating steering methods. The toolkit is Hugging Face native and is released under an Apache 2.0 license at https://github.com/IBM/AISteer360.
2025
STAR: Spectral Truncation and Rescale for Model Merging
Yu-Ang Lee | Ching-Yun Ko | Tejaswini Pedapati | I-Hsin Chung | Mi-Yen Yeh | Pin-Yu Chen
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
Yu-Ang Lee | Ching-Yun Ko | Tejaswini Pedapati | I-Hsin Chung | Mi-Yen Yeh | Pin-Yu Chen
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
Model merging is an efficient way of obtaining a multi-task model from several pretrained models without further fine-tuning, and it has gained attention in various domains, including natural language processing (NLP). Despite the efficiency, a key challenge in model merging is the seemingly inevitable decrease in task performance as the number of models increases. In this paper, we propose **S**pectral **T**runcation **A**nd **R**escale (STAR) that aims at mitigating “merging conflicts” by truncating small components in the respective spectral spaces, which is followed by an automatic parameter rescaling scheme to retain the nuclear norm of the original matrix. STAR requires no additional inference on original training data and is robust to hyperparamater choice. We demonstrate the effectiveness of STAR through extensive model merging cases on diverse NLP tasks. Specifically, STAR works robustly across varying model sizes, and can outperform baselines by 4.2% when merging 12 models on Flan-T5. Our code is publicly available at https://github.com/IBM/STAR.
Attention Tracker: Detecting Prompt Injection Attacks in LLMs
Kuo-Han Hung | Ching-Yun Ko | Ambrish Rawat | I-Hsin Chung | Winston H. Hsu | Pin-Yu Chen
Findings of the Association for Computational Linguistics: NAACL 2025
Kuo-Han Hung | Ching-Yun Ko | Ambrish Rawat | I-Hsin Chung | Winston H. Hsu | Pin-Yu Chen
Findings of the Association for Computational Linguistics: NAACL 2025
Large Language Models (LLMs) have revolutionized various domains but remain vulnerable to prompt injection attacks, where malicious inputs manipulate the model into ignoring original instructions and executing designated action. In this paper, we investigate the underlying mechanisms of these attacks by analyzing the attention patterns within LLMs. We introduce the concept of the distraction effect, where specific attention heads, termed important heads, shift focus from the original instruction to the injected instruction. Building on this discovery, we propose Attention Tracker, a training-free detection method that tracks attention patterns on instruction to detect prompt injection attacks without the need for additional LLM inference. Our method generalizes effectively across diverse models, datasets, and attack types, showing an AUROC improvement of up to 10.0% over existing methods, and performs well even on small LLMs. We demonstrate the robustness of our approach through extensive evaluations and provide insights into safeguarding LLM-integrated systems from prompt injection vulnerabilities.
Search
Fix author
Co-authors
- Pin-Yu Chen 3
- I-Hsin Chung 2
- Tejaswini Pedapati 2
- Avinash Balakrishnan 1
- Subhajit Chaudhury 1
- Sihui Dai 1
- Elizabeth M. Daly 1
- Payel Das 1
- Pierre Dognin 1
- Winston H. Hsu 1
- Kuo-Han Hung 1
- Georgios Kollias 1
- Yu-Ang Lee 1
- Aurelie C. Lozano 1
- Erik Miehling 1
- Karthikeyan Natesan Ramamurthy 1
- Ambrish Rawat 1
- Matthew Riemer 1
- Moninder Singh 1
- Kush R. Varshney 1
- Inge Vejsbjerg 1
- Praveen Venkateswaran 1
- Dennis Wei 1
- Mi-Yen Yeh 1