Perplexing Canon: A study on GPT-based perplexity of canonical and non-canonical literary works

Yaru Wu; Yuri Bizzoni; Pascale Feldkamp; Kristoffer Nielbo

Perplexing Canon: A study on GPT-based perplexity of canonical and non-canonical literary works

Yaru Wu, Yuri Bizzoni, Pascale Moreira, Kristoffer Nielbo

Abstract

This study extends previous research on literary quality by using information theory-based methods to assess the level of perplexity recorded by three large language models when processing 20th-century English novels deemed to have high literary quality, recognized by experts as canonical, compared to a broader control group. We find that canonical texts appear to elicit a higher perplexity in the models, we explore which textual features might concur to create such an effect. We find that the usage of a more heavily nominal style, together with a more diverse vocabulary, is one of the leading causes of the difference between the two groups. These traits could reflect “strategies” to achieve an informationally dense literary style.

Anthology ID:: 2024.latechclfl-1.16
Volume:: Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)
Month:: March
Year:: 2024
Address:: St. Julians, Malta
Editors:: Yuri Bizzoni, Stefania Degaetano-Ortlieb, Anna Kazantseva, Stan Szpakowicz
Venues:: LaTeCHCLfL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 172–184
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2024.latechclfl-1.16/
DOI:
Bibkey:
Cite (ACL):: Yaru Wu, Yuri Bizzoni, Pascale Moreira, and Kristoffer Nielbo. 2024. Perplexing Canon: A study on GPT-based perplexity of canonical and non-canonical literary works. In Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), pages 172–184, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):: Perplexing Canon: A study on GPT-based perplexity of canonical and non-canonical literary works (Wu et al., LaTeCHCLfL 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2024.latechclfl-1.16.pdf
Supplementarymaterial:: 2024.latechclfl-1.16.SupplementaryMaterial.zip
Video:: https://preview.aclanthology.org/ingest-emnlp/2024.latechclfl-1.16.mp4

PDF Cite Search Supplementarymaterial Video Fix data