← Papers
May 2026

Calibrated Falsification of Prior Dorabella Decryptions: A Negative Result on Substitution-Consistency and Language Identification

cryptanalysisdorabellacomputational-linguisticsfalsificationsubstitution-cipher

Abstract

We apply a calibration-first falsification pipeline to Edward Elgar's 1897 Dorabella Cipher and to three prior published Dorabella decryptions, plus one adjacent claim on the Liszt Fragment (a separate 18-symbol Elgar artifact that uses the same alphabet).

The first negative result concerns the only length-matched prior reading: Packwood (2020) produces an 87-character English plaintext that, aligned position-by-position with the cipher, requires 15 of 20 distinct cipher symbols to encode multiple plaintext letters, with one symbol encoding eight different letters within the same text. This homophony degree exceeds standard substitution and homophonic-substitution designs by a wide margin, and the reading does not constitute a valid strict-substitution decryption of the cipher. The two length-mismatched Dorabella readings (Sams 100 chars, Roberts 75 chars) and Thorley's 18-symbol Liszt-Fragment reading are length-disqualified before the strict-substitution test applies; we note them only as preliminaries. Sams (1970) explicitly framed his reading as phonetic-with-broken-spelling rather than strict substitution, a frame the present pipeline cannot directly falsify. We propose substitution-consistency as the minimum standard for length-matched strict-substitution claims, and outline what an analogous falsifiable standard for phonetic claims would require.

The second negative result concerns language identification: under matched-budget simulated annealing (3,000 random restarts × 8,000 iterations under Italian quadgram model), the cipher's best-of-SA dictcollision net signal is +0.150, vs. a 30-shuffle matched-budget baseline of +0.310 ±0.130 (z = −1.23). We cannot demonstrate Italian as the plaintext language under matched-budget calibration. A specific alphabet (BASE_MAPPING), found via lower-depth SA combined with zero-conflict crib locking and linguistic constraint propagation, decodes the cipher to text containing five middle-frequency Italian content morphemes (mandava, piume, alcun, dissi, odio); these tokens do not form a coherent Italian message, and we exhibit BASE_MAPPING as an illustrative SA-on-short-cipher false positive rather than a candidate decryption.

The paper's contributions are the substitution-consistency standard (concretely demonstrated on Packwood), the matched-budget calibration result extending Wase (2025) from English/Latin to Italian, and a worked example of the alphabet-search false-positive mode that calibration-free methods produce.

Full Paper

Having trouble viewing? Open PDF directly ↗