Tongxi Wang

2026

FBS: Modeling Native Parallel Reading inside a Transformer
Tongxi Wang
Findings of the Association for Computational Linguistics: ACL 2026

Large language models (LLMs) excel across many tasks, yet inference is still dominated by strictly token-by-token autoregression. Existing acceleration methods largely patch this pipeline and miss core human-reading ingredients: content-adaptive foresight, chunk-structure-aware compute allocation, and train–test consistency for preview/skimming. We propose the Fovea–Block–Skip Transformer (FBS), which injects a causal, trainable loop into Transformers via Parafovea-Attention Window (PAW), Chunk-Head (CH), and Skip-Gate (SG). Across diverse benchmarks, FBS improves the quality-efficiency trade-off without increasing parameters, and ablations show the three modules are complementary.

Co-authors

Venues

Findings1

Fix author