Authorship Attribution with Convolutional Neural Networks and POS-Eliding

Julian Hitschler, Esther van den Berg, Ines Rehbein


Abstract
We use a convolutional neural network to perform authorship identification on a very homogeneous dataset of scientific publications. In order to investigate the effect of domain biases, we obscure words below a certain frequency threshold, retaining only their POS-tags. This procedure improves test performance due to better generalization on unseen data. Using our method, we are able to predict the authors of scientific publications in the same discipline at levels well above chance.
Anthology ID:
W17-4907
Volume:
Proceedings of the Workshop on Stylistic Variation
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Julian Brooke, Thamar Solorio, Moshe Koppel
Venue:
Style-Var
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
53–58
Language:
URL:
https://aclanthology.org/W17-4907
DOI:
10.18653/v1/W17-4907
Bibkey:
Cite (ACL):
Julian Hitschler, Esther van den Berg, and Ines Rehbein. 2017. Authorship Attribution with Convolutional Neural Networks and POS-Eliding. In Proceedings of the Workshop on Stylistic Variation, pages 53–58, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Authorship Attribution with Convolutional Neural Networks and POS-Eliding (Hitschler et al., Style-Var 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/W17-4907.pdf