Yang Lyu
2026
Privacy Risks of Intermediate Representations: Attribute Inference in Distributed LLM Inference
Yang Lyu | Jin Cao | Yang Xiao | Zhe Sun | Ben Niu | Fenghua Li | Hui LI
Findings of the Association for Computational Linguistics: ACL 2026
Yang Lyu | Jin Cao | Yang Xiao | Zhe Sun | Ben Niu | Fenghua Li | Hui LI
Findings of the Association for Computational Linguistics: ACL 2026
Distributed LLM inference avoids sending raw inputs by transmitting intermediate hidden states, a practice widely assumed to preserve privacy. We challenge this assumption and demonstrate that intermediate representations alone are sufficient to leak sensitive user attributes. This setting poses a fundamental obstacle for existing attribute inference attacks, which typically rely on auxiliary embedding-attribute pairs. To characterize this previously underexplored privacy risk, we reformulate attribute inference as zero-shot matching over candidate attributes directly in the intermediate representation space, and introduce a purely intermediate-representation-based attribute inference attack, termed IR-AIA. To address two structural challenges that hinder attribute inference from intermediate representations, we propose SG-APCR to address layer-dependent anisotropy in intermediate embeddings and a sliding-window similarity matching strategy to handle subword-level semantic fragmentation. Experiments across three LLMs and three real-world datasets show that sensitive attributes can be reliably inferred using only intermediate representations, achieving Top-1 accuracy of up to 0.997 on CMS, 0.980 on Skytrax, and 0.986 on ECHR. These results reveal that intermediate states commonly considered safe to share can expose sensitive personal attributes on their own.