Xingjian Du
2026
AudioStealer: Extracting Audio Prompts via Shapley Value-Guided Query Search
Yingbin Jin | Xingjian Du | Hanjun Luo | Zihao Wang | Haibo Hu | XiaoFeng Wang | Xinfeng Li
Findings of the Association for Computational Linguistics: ACL 2026
Yingbin Jin | Xingjian Du | Hanjun Luo | Zihao Wang | Haibo Hu | XiaoFeng Wang | Xinfeng Li
Findings of the Association for Computational Linguistics: ACL 2026
As text-to-music models gain widespread adoption, the prompts used to guide these systems have become valuable intellectual property. This shift has given rise to a new form of attack: prompt stealing, aiming to reconstruct the high-value prompts that guide the music generation. However, unlike prior work in text and image generation, prompt stealing in text-to-music systems faces unique challenges due to the entangled and diffuse nature of semantic representations in audio, which complicates the decoupling of specific textual tokens from acoustic outputs. To address these challenges, we present AudioStealer, the first targeted study of prompt inversion in the audio domain. AudioStealer operates via a two-stage black-box attack framework: first, a heuristic search guided by audio-language embeddings identifies initial candidates; then, these candidates are refined using a game-theoretic strategy based on Shapley value estimation to attribute precise semantic contributions. Our method requires no direct access to the target model and relies solely on a shadow model, making it broadly applicable. Through extensive experiments, we demonstrate that AudioStealer recovers prompts with high textual consistency to the ground truth, while the regenerated audio maintains strong perceptual similarity to the target recordings. These results expose critical vulnerabilities in the text-to-audio market ecosystem and underscore the urgent need for intellectual property protections in generative audio technologies.
2023
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng | Eric Alcaide | Quentin Anthony | Alon Albalak | Samuel Arcadinho | Stella Biderman | Huanqi Cao | Xin Cheng | Michael Chung | Leon Derczynski | Xingjian Du | Matteo Grella | Kranthi Gv | Xuzheng He | Haowen Hou | Przemyslaw Kazienko | Jan Kocon | Jiaming Kong | Bartłomiej Koptyra | Hayden Lau | Jiaju Lin | Krishna Sri Ipsit Mantri | Ferdinand Mom | Atsushi Saito | Guangyu Song | Xiangru Tang | Johan Wind | Stanisław Woźniak | Zhenyuan Zhang | Qinghua Zhou | Jian Zhu | Rui-Jie Zhu
Findings of the Association for Computational Linguistics: EMNLP 2023
Bo Peng | Eric Alcaide | Quentin Anthony | Alon Albalak | Samuel Arcadinho | Stella Biderman | Huanqi Cao | Xin Cheng | Michael Chung | Leon Derczynski | Xingjian Du | Matteo Grella | Kranthi Gv | Xuzheng He | Haowen Hou | Przemyslaw Kazienko | Jan Kocon | Jiaming Kong | Bartłomiej Koptyra | Hayden Lau | Jiaju Lin | Krishna Sri Ipsit Mantri | Ferdinand Mom | Atsushi Saito | Guangyu Song | Xiangru Tang | Johan Wind | Stanisław Woźniak | Zhenyuan Zhang | Qinghua Zhou | Jian Zhu | Rui-Jie Zhu
Findings of the Association for Computational Linguistics: EMNLP 2023
Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs. Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, thus parallelizing computations during training and maintains constant computational and memory complexity during inference. We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers, suggesting future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling trade-offs between computational efficiency and model performance in sequence processing tasks.
Search
Fix author
Co-authors
- Alon Albalak 1
- Eric Alcaide 1
- Quentin Anthony 1
- Samuel Arcadinho 1
- Stella Biderman 1
- Huanqi Cao 1
- Xin Cheng 1
- Michael Chung 1
- Leon Derczynski 1
- Matteo Grella 1
- Kranthi Gv 1
- Xuzheng He 1
- Haowen Hou 1
- Haibo Hu 1
- Yingbin Jin 1
- Przemyslaw Kazienko 1
- Jan Kocon 1
- Jiaming Kong 1
- Bartłomiej Koptyra 1
- Hayden Lau 1
- Xinfeng Li 1
- Jiaju Lin 1
- Hanjun Luo 1
- Krishna Sri Ipsit Mantri 1
- Ferdinand Mom 1
- Bo Peng 1
- Atsushi Saito 1
- Guangyu Song 1
- Xiangru Tang 1
- Zihao Wang 1
- XiaoFeng Wang 1
- Johan Wind 1
- Stanisław Woźniak 1
- Zhenyuan Zhang 1
- Qinghua Zhou 1
- Jian Zhu 1
- Rui-Jie Zhu 1