Abstract
Over the last years, author profiling in general and author gender identification in particular have become a popular research area due to their potential attractive applications that range from forensic investigations to online marketing studies. However, nearly all state-of-the-art works in the area still very much depend on the datasets they were trained and tested on, since they heavily draw on content features, mostly a large number of recurrent words or combinations of words extracted from the training sets. We show that using a small number of features that mainly depend on the structure of the texts we can outperform other approaches that depend mainly on the content of the texts and that use a huge number of features in the process of identifying if the author of a text is a man or a woman. Our system has been tested against a dataset constructed for our work as well as against two datasets that were previously used in other papers.- Anthology ID:
- L14-1030
- Volume:
- Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
- Month:
- May
- Year:
- 2014
- Address:
- Reykjavik, Iceland
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 1315–1319
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/104_Paper.pdf
- DOI:
- Cite (ACL):
- Juan Soler Company and Leo Wanner. 2014. How to Use less Features and Reach Better Performance in Author Gender Identification. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 1315–1319, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Cite (Informal):
- How to Use less Features and Reach Better Performance in Author Gender Identification (Soler Company & Wanner, LREC 2014)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/104_Paper.pdf