The Vigenere Cipher

The Vigenere Cipher is a clever variation on the Caeser shift cipher that is both easy to implement and resistant to very simple frequency analysis attacks. As such, it is more secure than any of the letter-for-letter substitution ciphers.

To encrypt a message like "here is how it works" using the Vigenere cipher, we first need a key. Typically this key is an easily remembered word or phrase. Suppose, for the sake of this example, our key is the word "vector". Translating the letters of our keyword into integers $\pmod{26}$ in the normal way, we find "vector" corresponds to $(21,4,2,19,14,17)$.

We then take our message and shift it's first letter by 21, it's second letter by 4, it's third letter by 2, it's fourth letter by 19, and so on... Once we get to the end of the key, we start back at the beginning, so the 7th letter is shifted by 21, the 8th is shifted by 4, and so on... to obtain the encrypted message "citxwjcsybhnjvml".

\begin{array}{lcccccccccccccccc}
&h &e &r &e &i &s &h &o &w &i &t &w &o &r &k &s\\
&21 &4 &2 &19 &14 &17 &21 &4 &2 &19 &14 &17 &21 &4 &2 &19\\
&c &i &t &x &w &j &c &s &y &b &h &n &j &v &m &l
\end{array}

Decrypting is easy if the key is known. One just shifts back by the same amounts.

However, note the monkey-wrench this throws in the method of trying to decrypt the message without the key by analyzing the frequency with each letter occurs in the message. The letter "e" might be the most frequent letter in our plaintext, but in the example above, it gets encoded sometimes as an "i", sometimes as an "x", and if our message was longer, we would see still other letters an "e" gets turned into...

This is not enough to stop us, however.

For the purposes of having a good example to talk about, consider the following plaintext message:

"The method used for the preparation and reading of code messages is simple in the extreme and at the same time impossible of translation unless the key is known. The ease with which the key may be changed is another point in favor of the adoption of this code by those desiring to transmit important messages without the slightest danger of their messages being read by political or business rivals. etc"

As an interesting tidbit, the passage above is used by Wade Trappe and Lawrence C. Washington in their book "Introduction to Cryptography with Coding Theory", and is taken from a short article in a 1917 issue of Scientific American that explains the Vigenere cipher and expresses an opinion as to its security.

Suppose someone has encrypted this message using a Vigenere cipher with a keyword of "codes" to obtain:

VVHQWVVRHMUSGJGTHKIHTSSEJCHLSFCBGVWCRLRYQTFSVGAHWKCUHWAUGLQHNSLRLJSHBLTSPISP
RDXLJSVEEGHLQWKASSKUWEPWQTWVSPGOELKCQYFNSVWLJSNIQKGNRGYBWLWGOVIOKHKAZKQKXZ
GYHCECMEIUJOQKWFWVEFQHKIJRCLRLKBIENQFRJLJSDHGRHLSFQTWLAUQRHWDMWLGUSGIKKFLRYV
CWVSPGPMLKASSJVOQXEGGVEYGGZMLJCXXLJSVPAIVWIKVRDRYGFRJLJSLVEGGVEYGGEIAPUUISFPBT
GNWWMUCZRVTWGLRWUGUMNCZVILE

Now, suppose we are trying to decrypt the message above.

If we had guessed or otherwise determined the keyword was 5 letters long, then we could grab every 5th letter starting with the first V above to get

VVUTTCCCQGCUNJTPJGKUQPKNJKYGKKGCJFQRKQJRQUDUKVPKVGGJJIVGJGGPFNCWUCE

Remember, all of these letters in this sampling should have been shifted by the same amount. Our first job is to figure out by how much these letters have been shifted -- that is to say, to figure out the first letter of our keyword is "C" (Recall the "C" in "Codes" represents a shift right of 2 letters)

Unfortunately, we can't just try every shift possible (there are only 26, remember). The reason for this is one of recognition -- how do you recognize which shift produced something in English? Remember, we only have every fifth letter. If you take any sentence -- like this one -- and yank out every fifth letter and look at it, it looks a lot like gibberish! (e.g., "ITNTLHEATYHELTLAIBS")

However, we can take advantage of the fact that one of these possibilities should "mostly agree" with English in the frequencies of the letters seen.

We know that the frequencies of the letters in English are given by:
$$\begin{array}{cccccccc}
a &b &c &d &e &... &y &z\\
0.082 &0.015 &0.028 &0.043 &0.127 &... &0.020 &0.001
\end{array}$$
Suppose we denote the frequencies of letters expected in an English message that has been Caeser-shifted $i$ letters to the right by a 26-dimensional vector $v_i$ as shown below. So for example,
$$\begin{array}{llllllll}
v_0 &= (&.082, &.015, &.028, &..., &.020, &.001)\\
v_1 &= (&.001, &.082, &.015, &.028, &..., &.020)\\
v_2 &= (&.020, &.001, &.082, &.015, &.028, &...,)\\
\vdots
\end{array}$$
One of these vectors should agree fairly closely with the frequencies of letters we see in our sampling of every 5th letter. Which vector that is tells us the shift amount for our sampling, and the first letter of our keyword.

So we compute the frequencies of letters seen in our sampling to find
$$u = (0,0,0.104,0.0149,0.0149,0.0298,...)$$
To find the vector in the previous list above that most closely matches the vector $u$, we recall that the dot product of two vectors is connected to the angle $\theta$ between those two vectors in the following way:
$$u \cdot v = |u||v| \cos \theta$$
If we want to find the two vectors $u$ and $v_i$ that most closely match, we want to find the two vectors with the smallest angle between them.

Noting that smaller angles produce larger cosine values and also noting that the magnitude of the denominator is the same for every $v_i$ as the same 26 numbers are involved each time (just in different orders), we can simply seek the two vectors $u$ and $v_i$ whose dot product is largest.

Computing these dot products we find:
$$\begin{array}{ll}
u \cdot v_0 &= 0.025\\
u \cdot v_1 &= 0.039\\
u \cdot v_2 &= 0.071 \leftarrow \textrm{Note how much bigger this one is!}\\
u \cdot v_3 &= 0.039\\
u \cdot v_4 &= 0.027\\
u \cdot v_5 &= 0.038\\
u \cdot v_6 &= 0.051\\
u \cdot v_7 &= 0.030\\
u \cdot v_8 &= 0.032\\
u \cdot v_9 &= 0.043\\
u \cdot v_{10} &= 0.034\\
u \cdot v_{11} &= 0.030\\
u \cdot v_{12} &= 0.034\\
u \cdot v_{13} &= 0.045\\
u \cdot v_{14} &= 0.036\\
u \cdot v_{15} &= 0.040\\
u \cdot v_{16} &= 0.043\\
u \cdot v_{17} &= 0.050\\
u \cdot v_{18} &= 0.039\\
u \cdot v_{19} &= 0.030\\
u \cdot v_{20} &= 0.033\\
u \cdot v_{21} &= 0.039\\
u \cdot v_{22} &= 0.037\\
u \cdot v_{23} &= 0.032\\
u \cdot v_{24} &= 0.049\\
u \cdot v_{25} &= 0.035\\
\end{array}$$
Seeing the biggest dot product above was between $u$ and $v_2$, we conclude that all of the letters in our sampling were shifted 2 letters to the right, and our keyword must hence start with a C. [Recall C = 2 when written in the usual way $\pmod{26}$.]

We find the rest of the letters of the keyword in a similar manner, sampling every 5th letter starting with second, third, fourth, and fifth letters of our overall encrypted message, respectively. Then, by shifting back the letters by these same amounts, we arrive at the decrypted message.

There is a lingering question, however. How did we know at the onset that the keyword had five letters?

$\require{amsmath}$To find the key length, we can take advantage of a very clever trick. Suppose we take the entire encrypted message and line it up with an offset of itself. So appealing to the same example used above, and using an offset of $k=4$ letters, we have:
$$\begin{array}{ccccccccccccccc}
V&V&H&Q&W&\fbox{V}&V&R&H&M&U&S&G&\cdots\\
&&&&V&\fbox{V}&H&Q&W&V&V&R&H&\cdots\\
\end{array}$$
Notice how the two boxed letters above agree. What is the probability of this happening? Specifically, what is the probability that $i^{th}$ letter of our encrypted message agrees with the $(i+k)^{th}$ letter of our encrypted message?

There are, of course, lots of ways in which these letters might agree. They could both be $A$'s, or they could both be $B$'s, or they could both be $C$'s, and so on... We'll have to find the probabilities of each one of these disjoint possibilities and add them up.

Suppose the $i^{th}$ letter was encoded with a Caeser-shift of $m$ letters to the right, while the $(i+k)^{th}$ letter was encoded with a Caeser-shift of $n$ letters to the right. Recalling the vectors $v_0,v_1,v_2,...$ that we used earlier, note that the letters that occur in the $i^{th}$ position then do so with frequencies found in $v_m$, and letters that occur in the $(i+k)^{th}$ position do so with frequencies found in $v_n$.

Suppose $v_m = (a_m, b_m, c_m, ..., z_m)$ and $v_n = (a_n, b_n, c_n, ..., z_n)$.

The probability that the $i^{th}$ and $(i+k)^{th}$ letters are both $A$'s is then the product $a_m a_n$. Similarly, the probability that the $i^{th}$ and $(i+k)^{th}$ letters are both $B$'s is then the product $b_m b_n$. As you might expect, this pattern continues -- resulting in the sum of all of these probabilities (and consequently the probability that these letters match) being
$$P(\textrm{match}) = a_m a_n + b_m b_n + \cdots + z_m z_n$$
With some degree of amazement, notice that we can write this another way...
$$P(\textrm{match}) = v_m \cdot v_n$$

Now, suppose we ask a follow-up question: "When is the probability of a match the highest?" The trivial answer is when the dot product $v_m \cdot v_n$ is maximized. But as we have seen before,
$$v_m \cdot v_n = |v_m||v_n| \cos \theta$$
is maximized when $\theta$, the angle between $v_m$ and $v_n$ is minimized.

Noting that when $v_m = v_n$, the angle between them must be zero, we can expect the highest probability of a match when $m=n$. Now observe that this happens whenever $k$ (the offset) is a multiple of the keyword length!

Since a higher probability of a match suggests it occurs more frequently, we simply need to count the number of matches seen in our encrypted message for several possible offsets $k$. The offset $k$ that produces the highest number of matches should be either our keyword length, or a multiple of that length. Ta-Da! Isn't that an amazingly clever attack?