SURGELLM: Rethinking Multi-Task Evaluation through Task-Aware Feature Gating with Class-Balanced Normalization

Noor Islam S. Mohammad, Ulug Bayazit


Abstract
Fine-tuned encoders deployed across heterogeneous NLP tasks face three compounding problems: mismatched inductive biases, class-imbalance corruption of feature statistics, and no mechanism to condition attention on external lexical knowledge. We introduce SURGELLM, a unified transformer framework that addresses each with a dedicated lightweight module: a surgical feature gate (learned per-dimension sigmoid over curated lexical indicators and [CLS]; provably degenerates to identity when features are uninformative), task-conditioned prefix tokens (quantized feature values and task identity prepended to every input), and Instance-Weighted Normalization (IWN; removes class-prior bias from gate statistics). We prove an excess-risk bound linking gate benefit to surgical feature alignment. Across four tasks, SST-2, multi-hop retrieval, LLM-prompt attribution, and authorship detection, covering 17,830 examples and eleven model variants over three seeds, the IWN variant achieves macro-F1 0.940 (+0.036 over the strongest non-IWN baseline; +0.130 on authorship detection). A random-vocabulary control (-0.028 avg. F1) confirms gains are lexical, not parametric. Code, vocabularies, and a 99.5%-recovery auto-extraction recipe are released.
Anthology ID:
2026.trustnlp-main.47
Volume:
Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026)
Month:
July
Year:
2026
Address:
San Diego, California
Editors:
Kai-Wei Chang, Ninareh Mehrabi, Satyapriya Krishna, Anubrata Das, Jwala Dhamala, Yang Trista Cao, Tharindu Kumarage, Anil Ramakrishna, Christos Christodoulopoulos, Yixin Wan, Aram Galystan, Anoop Kumar, Rahul Gupta
Venues:
TrustNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
600–617
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.trustnlp-main.47/
DOI:
Bibkey:
Cite (ACL):
Noor Islam S. Mohammad and Ulug Bayazit. 2026. SURGELLM: Rethinking Multi-Task Evaluation through Task-Aware Feature Gating with Class-Balanced Normalization. In Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026), pages 600–617, San Diego, California. Association for Computational Linguistics.
Cite (Informal):
SURGELLM: Rethinking Multi-Task Evaluation through Task-Aware Feature Gating with Class-Balanced Normalization (Mohammad & Bayazit, TrustNLP 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.trustnlp-main.47.pdf