Abstract
The majority of approaches to author profiling and author identification focus mainly on lexical features, i.e., on the content of a text. We argue that syntactic and discourse features play a significantly more prominent role than they were given in the past. We show that they achieve state-of-the-art performance in author and gender identification on a literary corpus while keeping the feature set small: the used feature set is composed of only 188 features and still outperforms the winner of the PAN 2014 shared task on author verification in the literary genre.- Anthology ID:
- E17-2108
- Volume:
- Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
- Month:
- April
- Year:
- 2017
- Address:
- Valencia, Spain
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 681–687
- Language:
- URL:
- https://aclanthology.org/E17-2108
- DOI:
- Cite (ACL):
- Juan Soler-Company and Leo Wanner. 2017. On the Relevance of Syntactic and Discourse Features for Author Profiling and Identification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 681–687, Valencia, Spain. Association for Computational Linguistics.
- Cite (Informal):
- On the Relevance of Syntactic and Discourse Features for Author Profiling and Identification (Soler-Company & Wanner, EACL 2017)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/E17-2108.pdf