Hongye Jin
2024
Taylor Unswift: Secured Weight Release for Large Language Models via Taylor Expansion
Guanchu Wang
|
Yu-Neng Chuang
|
Ruixiang Tang
|
Shaochen Zhong
|
Jiayi Yuan
|
Hongye Jin
|
Zirui Liu
|
Vipin Chaudhary
|
Shuai Xu
|
James Caverlee
|
Xia Hu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Ensuring the security of released large language models (LLMs) poses a significant dilemma, as existing mechanisms either compromise ownership rights or raise data privacy concerns. To address this dilemma, we introduce TaylorMLP to protect the ownership of released LLMs and prevent their abuse. Specifically, TaylorMLP preserves the ownership of LLMs by transforming the weights of LLMs into parameters of Taylor-series. Instead of releasing the original weights, developers can release the Taylor-series parameters with users, thereby ensuring the security of LLMs. Moreover, TaylorMLP can prevent abuse of LLMs by adjusting the generation speed. It can induce low-speed token generation for the protected LLMs by increasing the terms in the Taylor-series. This intentional delay helps LLM developers prevent potential large-scale unauthorized uses of their models. Empirical experiments across five datasets and three LLM architectures demonstrate that TaylorMLP induces over increase in latency, producing the tokens precisely matched with original LLMs. Subsequent defensive experiments further confirm that TaylorMLP effectively prevents users from reconstructing the weight values based on downstream datasets.
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
Jiayi Yuan
|
Hongyi Liu
|
Shaochen Zhong
|
Yu-Neng Chuang
|
Songchen Li
|
Guanchu Wang
|
Duy Le
|
Hongye Jin
|
Vipin Chaudhary
|
Zhaozhuo Xu
|
Zirui Liu
|
Xia Hu
Findings of the Association for Computational Linguistics: EMNLP 2024
Long context capability is a crucial competency for large language models (LLMs) as it mitigates the human struggle to digest long-form texts. This capability enables complex task-solving scenarios such as book summarization, code assistance, and many more tasks that are traditionally manpower-intensive. However, transformer-based LLMs face significant challenges with long context input due to the growing size of the KV cache and the intrinsic complexity of attending to extended inputs; where multiple schools of efficiency-driven approaches — such as KV cache quantization, token dropping, prompt compression, linear-time sequence models, and hybrid architectures — have been proposed to produce efficient yet long context-capable models. Despite these advancements, no existing work has comprehensively benchmarked these methods in a reasonably aligned environment. In this work, we fill this gap by providing a taxonomy of current methods and evaluating 10+ state-of-the-art approaches across seven categories of long context tasks. Our work reveals numerous previously unknown phenomena and offers insights — as well as a friendly workbench — for the future development of long context-capable LLMs. The source code is available at https://github.com/henryzhongsc/longctx_bench.
Search
Co-authors
- Duy Le 1
- Guanchu Wang 2
- Hongyi Liu 1
- James Caverlee 1
- Jiayi Yuan 2
- show all...