Using the Kasiski examination technique works particularly well when messages contain repeating ciphertext fragments. However, if the author of a message is particularly careful they may be able to avoid these repeating fragments. At this point it would be difficult to determine the keylength manually. Fortunately, with the Index of Coincidence and Chi-Squared scoring, you can almost completely automate the cracking of Vigenère ciphers.
While the Kasiski method elegantly uses repeated ciphertext fragments and the distance between them to logically deduce the length of the keyword used in a Vigenère cipher, this method is a brute force attack to determine key length.
We know from earlier in this chapter that a monoalphabetic cipher has an index of coincidence near $0.066$ while a polyalphabetic cipher has an index of coincidence near $0.038$. We also know that if a Vigenère enciphered text is split into $n$ groups, where $n$ is the length of the keyword, that every letter in each group was enciphered with the same key letter, resulting in a monoalphabetic distribution of letters within each group.
We can use Python to guess key lengths, split the text into groups based on the guesses, and check to see if the groups of letters are monoalphabetic or polyalphabetic. If the groups appear to be monoalphabetic then we'll know we've guessed the correct key length.
Suppose you have the following ciphertext, stored to the string
ZQQTK PQUWD PGMWD BQTXY LFQWL SHAJB UCIPV KUQEJ RBAAC LRSIZ ZCRWT LDFMT PGYXF ISOSE ASZXN PHTAY HHIIR ADDIJ LBFOE VKUWW VFFLV TCEXG HFFXF ZVGXF BFQEI ZOSEZ UGFGF UJUGK PCZWZ UQQJI VAFLV CSDCX YOPYR SQTEI HQFII VTAYI LRGGR AWARN LAGWK JCZXZ UIMPC FTAVX LHMRU LAMRT PDMXV VIDWV SJQWW YCYOE VKXIU NSBVV CWAYJ SMMGH BWDIU DSYYJ AGQXR ZWPIF SRZSK PCZWR URQQS YOOIW YSELF USEEE KOEAV SSMVE DSYYJ APQHR PZKYE SSMVE PBSWF TSFLZ UUILZ JVUXY HGOSJ AIERF ZAMPC SONSL YOZHR ULUIK FHAET XIUVV HBPXY PGPMW MWOYC AMMXK HQTIJ PHEIC MAAVV JZAWV SMFSR UOSIZ UKTMT ODDSX YSEWY HGSEZ USPEJ AFARX HGOIE KSZGP VJQVG YSVYU PQQEE KWZAY PQTTV YGARJ HBPXY PBSWR YSPEP IMPEP MWZHZ UUFLV PFDIR SZQZV SWZPZ LIAJK OSUVT VBHIE AWARR SJMPL LHTIJ HAQTI PBOMG SSEAY PQTLR CSEAV WHMAR FHDEU PHUSE HZMFL ZSEEE KKTMT OODID HYURX YOBMU OOHST HAARX AVQVV CSZYV ZCRWZ USOYI PGFWR UREXI PDBME NHTIK OWZXR DRDCM LWXJI VAMXK YOOXZ CSEYG LFEXZ AWARJ HFQAF YYURX HGMGK PJQPP PBXMK LFMXL YSMWZ UGAGZ LHKXY LQDIU BZUXP VTARV DFUXV YCDXY LDMVK POXMK FCREE VHTII MWZHJ HGBSN LFRYC HHAYT OGFSE LOZHR ZKTSC LGAQV HQTEJ AWEID LBFME AVQLV HZFLP ZQQTK PQUWD VTMXV TDQVR ASOPR ZGAJR UHMKF UWEXJ HGFLV KFQED ZCRGF UGQVM HHUWD VFFLV PABSJ AIDIJ VTBPL YOXMJ AGURV JIDIJ PBFLV JVGVT OVUWK VFKEE KHDEU PHUSE DVQXY LFAJR UQUIE ACDGF TDMVR AWHIC FFQGV UHFMD LGMVV ZINNV JHQHK VJQVP KWRJV YSZXY HBPPZ UURVF THTEK DVUGY AVQME KIXKV UQQSI JFQHL SWFCF MTAVD LFMKV ZQAYC KOXPF DAQVV ZHMXV TSZXJ HFQNV HZAYJ SMIEK JVQHR URFLV TCFMM LGAJK OSIVZ ASDJF YAMWZ TDAVK HBFEE PBSVV KWQRK PBFLV HBMPP ZWESW OWELZ ZHAVP HGFLV MOOXJ OSDIT VFPWG YCNES PZUXP PGMTF DSDJL SOZHK YCGFC LGAQV ASEXR URUXZ ZPKXY PGFVF BPXIJ VAQWK HBPEI KHTEK HZMVX LDAVK PCZSW OWEXF YWOEC LJUHV UQQMJ ZWRXV KQARJ PGFIE JMUWE VZQWJ WSDXZ UOOMF BGMRU LLMGK PBSME PHEHV TOZHJ PBNVZ LTFSN YWFIR OWEXF YMIID BGFOE VKYSI LHTEE TSDIW HQFWY BAMRE HHGVV CWQAV KIZHV YOZME KIOXZ VBAJV EHQRU LRQBG LFUIE JSUWK OSNIJ AVQPG ACFLV JFUXZ JWEQF MVGQR UVUWK VFKLZ ZHAVZ JOXGY HFMGK LFEGR UCZPP ISQWK PAMXV KPKXY LGFEE KODHN OWOLY BAMRV EDQVZ LBOIN OSFLV YOOXL HZAVK YOPMK PCZEI FVMWW BFZMJ OSPXF MCDQT VFDIT AJUIN ZCRME KWHMU BOXWN LAGWK YSSEI KHTID HGRSI TWZKG HFFWF MOSVV HHILF SSIID BGFQV HGGVV AVQQS FHTIZ YFQPR AWARK VHTID HGESW ISURX ZPKAY VAFLV FODIJ BFDSL URQHR URURT VBFID WZMXZ UUFLV PBOMU LBFWZ UHTIZ YZUZV ZCDGF URUXZ VBILZ JVFVR KWFMF UVMWY HBPIU KCIRK VIEAV TIEXI HHTII JCZWZ KSDXY LUQRV YOXFV HFURX VTFLV DVAPV UODVR AWHIK OOZXY LFQWG LQFMM LDDSS HPUPZ AMAJZ AGPIK HWXW
Trying the key lengths between $1$ and $8$, and averaging the index of coincidence values for all groups of letters generated.
Key Length Guess: 1 Average Index of Coincidence: 0.0414 Key Length Guess: 2 Average Index of Coincidence: 0.0411 Key Length Guess: 3 Average Index of Coincidence: 0.0412 Key Length Guess: 4 Average Index of Coincidence: 0.0409 Key Length Guess: 5 Average Index of Coincidence: 0.0658 Key Length Guess: 6 Average Index of Coincidence: 0.0410 Key Length Guess: 7 Average Index of Coincidence: 0.0417 Key Length Guess: 8 Average Index of Coincidence: 0.0406
it can be seen that key length of $5$ is the only guess so far that results in groups of letters that are monoalphabetic. As a result, this is our best best for the correct key length.
It's worth pointing out that multiples of the key length will also be likely to have index of coincidence values that are close to $0.066$ as well. To see why that would be the case, take a key length guess of $10$, even though the actual key is $5$. The letters in groups 1 and 6 would have all been enciphered by the first letter in the keyword, since the keyword repeats after its first use. Groups 2 and 7 would have had their letters enciphered with the same second letter in the keyword, and so on. As long as there are a sufficient amount of letters in each group to accurately calculate the index of coincidence, every multiple of $5$ will calculate an index of coincidence close to $0.066$
Key Length Guess: 5 Average Index of Coincidence: 0.0658 Key Length Guess: 10 Average Index of Coincidence: 0.0648 Key Length Guess: 15 Average Index of Coincidence: 0.0647 Key Length Guess: 20 Average Index of Coincidence: 0.0647 Key Length Guess: 25 Average Index of Coincidence: 0.0652 Key Length Guess: 30 Average Index of Coincidence: 0.0639 Key Length Guess: 35 Average Index of Coincidence: 0.0674 Key Length Guess: 40 Average Index of Coincidence: 0.0645 Key Length Guess: 45 Average Index of Coincidence: 0.0643 Key Length Guess: 50 Average Index of Coincidence: 0.0632 Key Length Guess: 55 Average Index of Coincidence: 0.0647 Key Length Guess: 60 Average Index of Coincidence: 0.0645 Key Length Guess: 65 Average Index of Coincidence: 0.0685 Key Length Guess: 70 Average Index of Coincidence: 0.0681 Key Length Guess: 75 Average Index of Coincidence: 0.0645 Key Length Guess: 80 Average Index of Coincidence: 0.0634 Key Length Guess: 85 Average Index of Coincidence: 0.0667 Key Length Guess: 90 Average Index of Coincidence: 0.0649 Key Length Guess: 95 Average Index of Coincidence: 0.0675 Key Length Guess: 100 Average Index of Coincidence: 0.0621
This ciphertext has approximately $2,000$ characters, and even splitting it into $100$ groups of $20$ characters is enough to see a monoalphabetic distribution of characters.
Group: 1 Min Score: 36.90910 Key value: 7 Key Letter: H Group: 2 Min Score: 41.31034 Key value: 14 Key Letter: O Group: 3 Min Score: 19.01105 Key value: 12 Key Letter: M Group: 4 Min Score: 36.14494 Key value: 4 Key Letter: E Group: 5 Min Score: 19.28371 Key value: 17 Key Letter: R Keyword: HOMER
Now that the keyword is known, you can attempt to decipher the message to determine if the analysis was correct.
print( vigenereDecipher( ciphertext, 'HOMER' ) )
And it can be seen that this is a passage from Homer's The Illiad.
Can you write Python functions to:
- Split ciphertext into a larger and larger number groups until the average index of coincidence is greater than $0.060$?
- Use chi-squared scoring and a determined keyword length to determine the keyword letters?