AcrosticSleuth: Probabilistic Identification and Ranking of Acrostics in Multilingual Corpora

Aleksandr Fedchin, Isabel Cooperman, Pramit Chaudhuri, Joseph P. Dexter


Abstract
For centuries, writers have hidden messages as acrostics, in which initial letters of consecutive lines or paragraphs form meaningful words or phrases. Scholars searching for acrostics manually can only focus on a few authors at a time and often favor qualitative arguments about whether a given acrostic is accidental or intentional. Here we describe AcrosticSleuth, a first-of-its-kind approach to identify acrostics automatically and rank them by the probability that the corresponding sequence of characters does not occur by chance. Since acrostics are rare, we formalize the problem as a binary classification task in the presence of extreme class imbalance. To evaluate AcrosticSleuth, we present the Acrostic Identification Dataset (AcrostID), a collection of acrostics from the WikiSource online database. Despite the class imbalance, AcrosticSleuth achieves F1 scores of 0.39, 0.59, and 0.66 on the French, English, and Russian subdomains of WikiSource, respectively. We further demonstrate that AcrosticSleuth can identify previously unknown instances of wordplay in high-profile literary contexts, including the English philosopher Thomas Hobbes’ signature in the opening paragraphs of The Elements of Law.
Anthology ID:
2025.findings-naacl.414
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7430–7437
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.414/
DOI:
Bibkey:
Cite (ACL):
Aleksandr Fedchin, Isabel Cooperman, Pramit Chaudhuri, and Joseph P. Dexter. 2025. AcrosticSleuth: Probabilistic Identification and Ranking of Acrostics in Multilingual Corpora. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 7430–7437, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
AcrosticSleuth: Probabilistic Identification and Ranking of Acrostics in Multilingual Corpora (Fedchin et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.414.pdf