Abstract
This is a pilot study that aims to explore the potential of using WEKA in forensic authorship analysis. It is a corpus-based research using data from Twitter collected from thirteen authors from Riyadh, Saudi Arabia. It examines the performance of unbalanced and balanced data sets using different classifiers and parameters of word grams. The attributes are dialect-specific linguistic features categorized as word grams. The findings further support previous studies in computational authorship identification.- Anthology ID:
- 2020.icon-main.34
- Volume:
- Proceedings of the 17th International Conference on Natural Language Processing (ICON)
- Month:
- December
- Year:
- 2020
- Address:
- Indian Institute of Technology Patna, Patna, India
- Editors:
- Pushpak Bhattacharyya, Dipti Misra Sharma, Rajeev Sangal
- Venue:
- ICON
- SIG:
- Publisher:
- NLP Association of India (NLPAI)
- Note:
- Pages:
- 257–260
- Language:
- URL:
- https://aclanthology.org/2020.icon-main.34
- DOI:
- Cite (ACL):
- Mashael AlAmr and Eric Atwell. 2020. WEKA in Forensic Authorship Analysis: A corpus-based approach of Saudi Authors. In Proceedings of the 17th International Conference on Natural Language Processing (ICON), pages 257–260, Indian Institute of Technology Patna, Patna, India. NLP Association of India (NLPAI).
- Cite (Informal):
- WEKA in Forensic Authorship Analysis: A corpus-based approach of Saudi Authors (AlAmr & Atwell, ICON 2020)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/2020.icon-main.34.pdf