Web Intellectual Property at Risk: Preventing Unauthorized Real-Time Retrieval by Large Language Models

Yisheng Zhong, Yizhu Wen, Junfeng Guo, Mehran Kafai, Heng Huang, Hanqing Guo, Zhuangdi Zhu


Abstract
The protection of cyber Intellectual Property (IP) such as web content is an increasingly critical concern. The rise of large language models (LLMs) with online retrieval capabilities enables convenient access to information but often undermines the rights of original content creators. As users increasingly rely on LLM-generated responses, they gradually diminish direct engagement with original information sources, which will significantly reduce the incentives for IP creators to contribute, and lead to a saturating cyberspace with more AI-generated content. In response, we propose a novel defense framework that empowers web content creators to safeguard their web-based IP from unauthorized LLM real-time extraction and redistribution by leveraging the semantic understanding capability of LLMs themselves. Our method follows principled motivations and effectively addresses an intractable black-box optimization problem. Real-world experiments demonstrated that our methods improve defense success rates from 2.5% to 88.6% on different LLMs, outperforming traditional defenses such as configuration-based restrictions.
Anthology ID:
2025.emnlp-main.870
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17222–17235
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.870/
DOI:
Bibkey:
Cite (ACL):
Yisheng Zhong, Yizhu Wen, Junfeng Guo, Mehran Kafai, Heng Huang, Hanqing Guo, and Zhuangdi Zhu. 2025. Web Intellectual Property at Risk: Preventing Unauthorized Real-Time Retrieval by Large Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 17222–17235, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Web Intellectual Property at Risk: Preventing Unauthorized Real-Time Retrieval by Large Language Models (Zhong et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.870.pdf
Checklist:
 2025.emnlp-main.870.checklist.pdf