Yang Lyu

2026

Distributed LLM inference avoids sending raw inputs by transmitting intermediate hidden states, a practice widely assumed to preserve privacy. We challenge this assumption and demonstrate that intermediate representations alone are sufficient to leak sensitive user attributes. This setting poses a fundamental obstacle for existing attribute inference attacks, which typically rely on auxiliary embedding-attribute pairs. To characterize this previously underexplored privacy risk, we reformulate attribute inference as zero-shot matching over candidate attributes directly in the intermediate representation space, and introduce a purely intermediate-representation-based attribute inference attack, termed IR-AIA. To address two structural challenges that hinder attribute inference from intermediate representations, we propose SG-APCR to address layer-dependent anisotropy in intermediate embeddings and a sliding-window similarity matching strategy to handle subword-level semantic fragmentation. Experiments across three LLMs and three real-world datasets show that sensitive attributes can be reliably inferred using only intermediate representations, achieving Top-1 accuracy of up to 0.997 on CMS, 0.980 on Skytrax, and 0.986 on ECHR. These results reveal that intermediate states commonly considered safe to share can expose sensitive personal attributes on their own.

Co-authors

Yang Xiao 1

Venues

Findings1

Fix author