Abe Hou
2024
SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation
Abe Hou
|
Jingyu Zhang
|
Tianxing He
|
Yichen Wang
|
Yung-Sung Chuang
|
Hongwei Wang
|
Lingfeng Shen
|
Benjamin Van Durme
|
Daniel Khashabi
|
Yulia Tsvetkov
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Existing watermarked generation algorithms employ token-level designs and therefore, are vulnerable to paraphrase attacks. To address this issue, we introduce watermarking on the semantic representation of sentences. We propose SemStamp, a robust sentence-level semantic watermarking algorithm that uses locality-sensitive hashing (LSH) to partition the semantic space of sentences. The algorithm encodes and LSH-hashes a candidate sentence generated by a language model, and conducts rejection sampling until the sampled sentence falls in watermarked partitions in the semantic embedding space. To test the paraphrastic robustness of watermarking algorithms, we propose a “bigram paraphrase” attack that produces paraphrases with small bigram overlap with the original sentence. This attack is shown to be effective against existing token-level watermark algorithms, while posing only minor degradations to SemStamp. Experimental results show that our novel semantic watermark algorithm is not only more robust than the previous state-of-the-art method on various paraphrasers and domains, but also better at preserving the quality of generation.
Search
Co-authors
- Jingyu Zhang 1
- Tianxing He 1
- Yichen Wang 1
- Yung-Sung Chuang 1
- Hongwei Wang 1
- show all...