Building a Long Text Privacy Policy Corpus with Multi-Class Labels

Florencia Marotta-Wurgler, David Stein


Abstract
Legal text poses distinctive challenges for natural language processing. The legal import of a term may depend on omissions, cross-references, or silence, Further, legal text is often susceptible to multiple valid, conflicting interpretations; as the saying goes: a good lawyer’s answer to any question is “it depends.”This work introduces a new, hand-coded dataset for the interpretation of privacy policies. It includes privacy policies from 149 firms, including materials incorporated by reference. The policies are annotated across 64 dimension that reflect the applicable legal rules and contested terms from EU and US privacy regulation and litigation. Our annotation methodology is designed to capture the capture core challenges peculiar to legal language, including indeterminacy, interdependence between clauses, meaningful silence, and the implications of legal defaults. We present a set of baseline results for the dataset using current large language models.
Anthology ID:
2025.acl-long.401
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8156–8219
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.401/
DOI:
Bibkey:
Cite (ACL):
Florencia Marotta-Wurgler and David Stein. 2025. Building a Long Text Privacy Policy Corpus with Multi-Class Labels. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8156–8219, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Building a Long Text Privacy Policy Corpus with Multi-Class Labels (Marotta-Wurgler & Stein, ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.401.pdf