@inproceedings{sundararajan-woodard-2018-represents,
    title = "What represents ``style'' in authorship attribution?",
    author = "Sundararajan, Kalaivani  and
      Woodard, Damon",
    editor = "Bender, Emily M.  and
      Derczynski, Leon  and
      Isabelle, Pierre",
    booktitle = "Proceedings of the 27th International Conference on Computational Linguistics",
    month = aug,
    year = "2018",
    address = "Santa Fe, New Mexico, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/display_plenaries/C18-1238/",
    pages = "2814--2822",
    abstract = "Authorship attribution typically uses all information representing both content and style whereas attribution based only on stylistic aspects may be robust in cross-domain settings. This paper analyzes different linguistic aspects that may help represent style. Specifically, we study the role of syntax and lexical words (nouns, verbs, adjectives and adverbs) in representing style. We use a purely syntactic language model to study the significance of sentence structures in both single-domain and cross-domain attribution, i.e. cross-topic and cross-genre attribution. We show that syntax may be helpful for cross-genre attribution while cross-topic attribution and single-domain may benefit from additional lexical information. Further, pure syntactic models may not be effective by themselves and need to be used in combination with other robust models. To study the role of word choice, we perform attribution by masking all words or specific topic words corresponding to nouns, verbs, adjectives and adverbs. Using a single-domain dataset, IMDB1M reviews, we demonstrate the heavy influence of common nouns and proper nouns in attribution, thereby highlighting topic interference. Using cross-domain Guardian10 dataset, we show that some common nouns, verbs, adjectives and adverbs may help with stylometric attribution as demonstrated by masking topic words corresponding to these parts-of-speech. As expected, it was observed that proper nouns are heavily influenced by content and cross-domain attribution will benefit from completely masking them."
}Markdown (Informal)
[What represents “style” in authorship attribution?](https://preview.aclanthology.org/display_plenaries/C18-1238/) (Sundararajan & Woodard, COLING 2018)
ACL
- Kalaivani Sundararajan and Damon Woodard. 2018. What represents “style” in authorship attribution?. In Proceedings of the 27th International Conference on Computational Linguistics, pages 2814–2822, Santa Fe, New Mexico, USA. Association for Computational Linguistics.