Tian Niezing


2026

We investigate the content effect bias in Large Language Models (LLMs) as part of SemEval 2026 Task 11. We compare the impact of supervised fine-tuning using low-rank adaptation against activation steering across several model families, including LLaMA, Gemma and Qwen. Our results show that SFT improves accuracy, with LLaMa 8B reaching 98.75\% accuracy. Activation steering offers limited effectiveness in mitigating the content effect bias. A logit lens analysis further reveals that fine-tuning successfully shifts the model’s focus toward logical structure, specifically within the later layers.