Not All Tokens Are Equal: Per-Dimension Top-K Pooling for Adversarially Robust BERT Classification
Manoranjan Dash, Shivam Anand Aralikatti, Shanay Sheth, Pranav Shinde
Abstract
Contextual text classification with BERT typically relies on the [CLS] token representation for downstream prediction. While effective under standard conditions, [CLS]-based pooling is brittle under adversarial perturbation, as its single-vector representation is indiscriminately influenced by injected adversarial tokens. We propose Per-Dimension Top-K Average Pooling, a pooling strategy that, for each hidden dimension, selectively aggregates only the top-K token activations rather than the full sequence — effectively controlling which tokens contribute to the final representation. This token-level selectivity acts as a natural filter against adversarial injection: tokens that do not rank among the top-K for a given dimension are suppressed from aggregation. We evaluate our approach against CLS, Global Average Pooling (GAP), Global Max Pooling (GMP), and Hybrid variants across three text classification domains: spam detection (Enron and LingSpam), automated essay scoring (ASAP), and hate speech classification. On the Enron spam dataset under adversarial attack, our best Hybrid (K=3) variant reduces the Attack Success Rate from 70.65% to 37.07% while maintaining clean accuracy above 99%, compared to CLS which degrades to 63.64% adversarial accuracy. Representation-level analyses further corroborate these findings: Top-K pooling variants exhibit substantially lower cosine similarity shift under attack, and adversarially injected tokens enter the top-K selection in far fewer dimensions compared to CLS. These results suggest that per-dimension token selectivity offers a principled and lightweight mechanism for adversarial robustness in BERT-based classifiers without any modification to the underlying model architecture.- Anthology ID:
- 2026.gem-main.29
- Volume:
- Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, USA
- Editors:
- Simon Mille, Sebastian Gehrmann, Patrícia Schmidtová, Ondřej Dušek, Marzieh Fadaee, Kyle Lo, Enrico Santus, Gabriel Stanovsky
- Venues:
- GEM | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 285–295
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.29/
- DOI:
- Cite (ACL):
- Manoranjan Dash, Shivam Anand Aralikatti, Shanay Sheth, and Pranav Shinde. 2026. Not All Tokens Are Equal: Per-Dimension Top-K Pooling for Adversarially Robust BERT Classification. In Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM), pages 285–295, San Diego, California, USA. Association for Computational Linguistics.
- Cite (Informal):
- Not All Tokens Are Equal: Per-Dimension Top-K Pooling for Adversarially Robust BERT Classification (Dash et al., GEM 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.29.pdf