Generalizing Unmasking for Short Texts
Janek Bevendorff, Benno Stein, Matthias Hagen, Martin Potthast
Abstract
Authorship verification is the problem of inferring whether two texts were written by the same author. For this task, unmasking is one of the most robust approaches as of today with the major shortcoming of only being applicable to book-length texts. In this paper, we present a generalized unmasking approach which allows for authorship verification of texts as short as four printed pages with very high precision at an adjustable recall tradeoff. Our generalized approach therefore reduces the required material by orders of magnitude, making unmasking applicable to authorship cases of more practical proportions. The new approach is on par with other state-of-the-art techniques that are optimized for texts of this length: it achieves accuracies of 75–80%, while also allowing for easy adjustment to forensic scenarios that require higher levels of confidence in the classification.- Anthology ID:
- N19-1068
- Volume:
- Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, Minnesota
- Editors:
- Jill Burstein, Christy Doran, Thamar Solorio
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 654–659
- Language:
- URL:
- https://aclanthology.org/N19-1068
- DOI:
- 10.18653/v1/N19-1068
- Cite (ACL):
- Janek Bevendorff, Benno Stein, Matthias Hagen, and Martin Potthast. 2019. Generalizing Unmasking for Short Texts. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 654–659, Minneapolis, Minnesota. Association for Computational Linguistics.
- Cite (Informal):
- Generalizing Unmasking for Short Texts (Bevendorff et al., NAACL 2019)
- PDF:
- https://preview.aclanthology.org/naacl24-info/N19-1068.pdf
- Code
- webis-de/NAACL-19