Isabel Cooperman
2025
AcrosticSleuth: Probabilistic Identification and Ranking of Acrostics in Multilingual Corpora
Aleksandr Fedchin
|
Isabel Cooperman
|
Pramit Chaudhuri
|
Joseph P. Dexter
Findings of the Association for Computational Linguistics: NAACL 2025
For centuries, writers have hidden messages as acrostics, in which initial letters of consecutive lines or paragraphs form meaningful words or phrases. Scholars searching for acrostics manually can only focus on a few authors at a time and often favor qualitative arguments about whether a given acrostic is accidental or intentional. Here we describe AcrosticSleuth, a first-of-its-kind approach to identify acrostics automatically and rank them by the probability that the corresponding sequence of characters does not occur by chance. Since acrostics are rare, we formalize the problem as a binary classification task in the presence of extreme class imbalance. To evaluate AcrosticSleuth, we present the Acrostic Identification Dataset (AcrostID), a collection of acrostics from the WikiSource online database. Despite the class imbalance, AcrosticSleuth achieves F1 scores of 0.39, 0.59, and 0.66 on the French, English, and Russian subdomains of WikiSource, respectively. We further demonstrate that AcrosticSleuth can identify previously unknown instances of wordplay in high-profile literary contexts, including the English philosopher Thomas Hobbes’ signature in the opening paragraphs of The Elements of Law.