The Voice But Not the Song: A Shorthand Hypothesis and the Statistical Fingerprint of the Voynich Manuscript
v2 (2026-04-19): This revision leads with a three-diagnostic simultaneous match (entropy shift + cross-boundary MI + frequency–connectivity) after reviewer feedback from Cryptologia and correspondence with Michael Greshko. Figure 1 contained erroneous tachygraphic magnitudes, and the accompanying numbers were extended and updated to the generalized Naibbe analysis. The v1 preprint (2026-03-23) relied on the entropy shift as the primary discriminator.
Abstract
We propose Italian syllabic tachygraphy, a medieval shorthand tradition documented in northern Italian notarial archives, as the encoding mechanism most consistent with the statistical fingerprint of the Voynich Manuscript (Beinecke MS 408), and develop a framework for evaluating this hypothesis against the manuscript's known properties.
The tachygraphic model is the only tested mechanism that simultaneously reproduces three independent Voynich statistical signatures at the hypothesis-consistent syllable-as-token granularity: the entropy shift signature (cosine +0.820 vs. Latin), Currier's cross-boundary mutual information anomaly (tachygraphy 1.285× vs. observed 1.450×), and the Timm–Schinner frequency–connectivity correlation (tachygraphy Spearman ρ = +0.585 vs. observed +0.618). Of these, the entropy shift alone is not specific: a generalized Naibbe verbose substitution cipher (Greshko, 2026) also reproduces it (+0.983), but fails both token-adjacency tests (1.002× MI; ρ = +0.235) even under 200-run parameter grid search. A simplified Naibbe implementation (Greshko, 2025) produces an anticorrelated entropy shift (−0.843); self-citation (−0.153) and Cardan grille (0.49–0.59) mechanisms are similarly eliminated. The independently derived statistical model matches Costamagna's 1953 catalog on all six structural dimensions tested.
A signal isolation method identifies 56 decoded words as statistically genuine under permutation testing (p = 0.001 for count, p = 0.011 for linguistic coherence) and 22 word-level content identifications (pharmaceutical Latin: ratione, coralli, diasene, stercora; p = 0.009). The language identification is validated by a controlled comparison: when the same stroke-feature framework is optimized against a German medical dictionary, the resulting assignment table produces fewer signal words and fails all three coherence criteria that the Latin-Italian table passes.
We systematically evaluate the model against seven well-known properties of Voynichese, finding three explained, three partially explained, and one genuine limitation: the 21 confirmed syllable values cover only 14.4% of Latin text, an arithmetic gap that explains why connected readable text has not been achieved. The model is over-determined (328 constraints vs. 29 degrees of freedom) and generates five specific falsifiable predictions. This constitutes hypothesis development, not a decipherment.