Decipherment as Regression: Solving Historical Substitution Ciphers by Learning Symbol Recurrence Relations

Nishant Kambhatla; Logan Born; Anoop Sarkar

doi:10.18653/v1/2023.findings-eacl.160

Decipherment as Regression: Solving Historical Substitution Ciphers by Learning Symbol Recurrence Relations

Nishant Kambhatla, Logan Born, Anoop Sarkar

Abstract

Solving substitution ciphers involves mapping sequences of cipher symbols to fluent text in a target language. This has conventionally been formulated as a search problem, to find the decipherment key using a character-level language model to constrain the search space. This work instead frames decipherment as a sequence prediction task, using a Transformer-based causal language model to learn recurrences between characters in a ciphertext. We introduce a novel technique for transcribing arbitrary substitution ciphers into a common recurrence encoding. By leveraging this technique, we (i) create a large synthetic dataset of homophonic ciphers using random keys, and (ii) train a decipherment model that predicts the plaintext sequence given a recurrence-encoded ciphertext. Our method achieves strong results on synthetic 1:1 and homophonic ciphers, and cracks several real historic homophonic ciphers. Our analysis shows that the model learns recurrence relations between cipher symbols and recovers decipherment keys in its self-attention.

Anthology ID:: 2023.findings-eacl.160
Volume:: Findings of the Association for Computational Linguistics: EACL 2023
Month:: May
Year:: 2023
Address:: Dubrovnik, Croatia
Editors:: Andreas Vlachos, Isabelle Augenstein
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2136–2152
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.findings-eacl.160/
DOI:: 10.18653/v1/2023.findings-eacl.160
Bibkey:
Cite (ACL):: Nishant Kambhatla, Logan Born, and Anoop Sarkar. 2023. Decipherment as Regression: Solving Historical Substitution Ciphers by Learning Symbol Recurrence Relations. In Findings of the Association for Computational Linguistics: EACL 2023, pages 2136–2152, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):: Decipherment as Regression: Solving Historical Substitution Ciphers by Learning Symbol Recurrence Relations (Kambhatla et al., Findings 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.findings-eacl.160.pdf
Video:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.findings-eacl.160.mp4

PDF Cite Search Video Fix data