PrivaT5: A Generative Language Model for Privacy Policies

Mohammad Zoubi, Santosh T.y.s.s, Edgar Rosas, Matthias Grabmair


Abstract
In the era of of digital privacy, users often neglect to read privacy policies due to their complexity. To bridge this gap, NLP models have emerged to assist in understanding privacy policies. While recent generative language models like BART and T5 have shown prowess in text generation and discriminative tasks being framed as generative ones, their application to privacy policy domain tasks remains unexplored. To address that, we introduce PrivaT5, a T5-based model that is further pre-trained on privacy policy text. We evaluate PrivaT5 over a diverse privacy policy related tasks and notice its superior performance over T5, showing the utility of continued domain-specific pre-training. Our results also highlight challenges faced by these generative models in complex structured output label space, especially in sequence tagging tasks, where they fall short compared to lighter encoder-only models.
Anthology ID:
2024.privatenlp-1.16
Volume:
Proceedings of the Fifth Workshop on Privacy in Natural Language Processing
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Ivan Habernal, Sepideh Ghanavati, Abhilasha Ravichander, Vijayanta Jain, Patricia Thaine, Timour Igamberdiev, Niloofar Mireshghallah, Oluwaseyi Feyisetan
Venues:
PrivateNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
159–169
Language:
URL:
https://aclanthology.org/2024.privatenlp-1.16
DOI:
Bibkey:
Cite (ACL):
Mohammad Zoubi, Santosh T.y.s.s, Edgar Rosas, and Matthias Grabmair. 2024. PrivaT5: A Generative Language Model for Privacy Policies. In Proceedings of the Fifth Workshop on Privacy in Natural Language Processing, pages 159–169, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
PrivaT5: A Generative Language Model for Privacy Policies (Zoubi et al., PrivateNLP-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/autopr/2024.privatenlp-1.16.pdf