Can Large Language Models Identify Authorship?

Baixiang Huang; Canyu Chen; Kai Shu

doi:10.18653/v1/2024.findings-emnlp.26

Can Large Language Models Identify Authorship?

Abstract

The ability to accurately identify authorship is crucial for verifying content authenticity and mitigating misinformation. Large Language Models (LLMs) have demonstrated exceptional capacity for reasoning and problem-solving. However, their potential in authorship analysis remains under-explored. Traditional studies have depended on hand-crafted stylistic features, whereas state-of-the-art approaches leverage text embeddings from pre-trained language models. These methods, which typically require fine-tuning on labeled data, often suffer from performance degradation in cross-domain applications and provide limited explainability. This work seeks to address three research questions: (1) Can LLMs perform zero-shot, end-to-end authorship verification effectively? (2) Are LLMs capable of accurately attributing authorship among multiple candidates authors (e.g., 10 and 20)? (3) Can LLMs provide explainability in authorship analysis, particularly through the role of linguistic features? Moreover, we investigate the integration of explicit linguistic features to guide LLMs in their reasoning processes. Our assessment demonstrates LLMs’ proficiency in both tasks without the need for domain-specific fine-tuning, providing explanations into their decision making via a detailed analysis of linguistic features. This establishes a new benchmark for future research on LLM-based authorship analysis.

Anthology ID:: 2024.findings-emnlp.26
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 445–460
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2024.findings-emnlp.26/
DOI:: 10.18653/v1/2024.findings-emnlp.26
Bibkey:
Cite (ACL):: Baixiang Huang, Canyu Chen, and Kai Shu. 2024. Can Large Language Models Identify Authorship?. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 445–460, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Can Large Language Models Identify Authorship? (Huang et al., Findings 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2024.findings-emnlp.26.pdf

PDF Cite Search Fix data