Abstract
We present domain independent models to date documents based only on neologism usage patterns. Our models capture patterns of neologism usage over time to date texts, provide insights into temporal locality of word usage over a span of 150 years, and generalize to various domains like News, Fiction, and Non-Fiction with competitive performance. Quite intriguingly, we show that by modeling only the distribution of usage counts over neologisms (the model being agnostic of the particular words themselves), we achieve competitive performance using several orders of magnitude fewer features (only 200 input features) compared to state of the art models some of which use 200K features.- Anthology ID:
- C18-1017
- Volume:
- Proceedings of the 27th International Conference on Computational Linguistics
- Month:
- August
- Year:
- 2018
- Address:
- Santa Fe, New Mexico, USA
- Editors:
- Emily M. Bender, Leon Derczynski, Pierre Isabelle
- Venue:
- COLING
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 202–212
- Language:
- URL:
- https://aclanthology.org/C18-1017
- DOI:
- Cite (ACL):
- Vivek Kulkarni, Yingtao Tian, Parth Dandiwala, and Steve Skiena. 2018. Simple Neologism Based Domain Independent Models to Predict Year of Authorship. In Proceedings of the 27th International Conference on Computational Linguistics, pages 202–212, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- Cite (Informal):
- Simple Neologism Based Domain Independent Models to Predict Year of Authorship (Kulkarni et al., COLING 2018)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/C18-1017.pdf