Automated Detection and Analysis of Data Practices Using A Real-World Corpus

Mukund Srinath, Pranav Narayanan Venkit, Maria Badillo, Florian Schaub, C. Giles, Shomir Wilson


Abstract
Privacy policies are crucial for informing users about data practices, yet their length and complexity often deter users from reading them. In this paper, we propose an automated approach to identify and visualize data practices within privacy policies at different levels of detail. Leveraging crowd-sourced annotations from the ToS;DR platform, we experiment with various methods to match policy excerpts with predefined data practice descriptions. We further conduct a case study to evaluate our approach on a real-world policy, demonstrating its effectiveness in simplifying complex policies. Experiments show that our approach accurately matches data practice descriptions with policy excerpts, facilitating the presentation of simplified privacy information to users.
Anthology ID:
2024.findings-acl.271
Volume:
Findings of the Association for Computational Linguistics: ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4567–4574
Language:
URL:
https://aclanthology.org/2024.findings-acl.271
DOI:
10.18653/v1/2024.findings-acl.271
Bibkey:
Cite (ACL):
Mukund Srinath, Pranav Narayanan Venkit, Maria Badillo, Florian Schaub, C. Giles, and Shomir Wilson. 2024. Automated Detection and Analysis of Data Practices Using A Real-World Corpus. In Findings of the Association for Computational Linguistics: ACL 2024, pages 4567–4574, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Automated Detection and Analysis of Data Practices Using A Real-World Corpus (Srinath et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/2024.findings-acl.271.pdf