Tuesday, January 1, 2008

National Treasure: Book of Secrets (2007)

The second installment of the National Treasure franchise brings us more riddles that unlock clues that bring more riddles. One of these clues (or was it a riddle? I cant keep track) is a burned piece of paper that contains a partial cipher text message. It turns out that this message was encrypted with the Playfair cipher, which was created in the mid-1800s by a gentalman named Charles Wheatstone and named after Lord Playfair, who promoted its use.

By modern standards Playfair is extremely weak, but at the time it offered a relatively simple method for encrypting messages that made frequency analysis attacks difficult, if not impossible, to perform.

If you are not familiar with substitution ciphers, the simplest example is ROT-13 (or rotate 13), a variation of the Caesar cipher that creates cipher text by replacing, or substituting, each letter in a word by the letter that is 13 places away in the Latin alphabet.


Any fan of Wheel of Fortune can tell you that the three most common letter in the English language are E, T, and A. With frequency analysis, it is pretty easy to determine that R, G and N represent E,T and A, simply by the fact that they occur most often in the cipher text. You can do further analysis by looking at the common ending letters, letters that most often follow E, etc. This type of analysis is made easier by the fact the ROT-13 keeps the structure of the words and sentences.

While still considered a substitution cipher, Playfair does a couple of things to break up frequency and structure. First, the plain text is broken down into groups of two letters called digraphs. If a grouping produces a double letter digraph, or there is a single letter left at the end, a substitution character is used, typically "X," for the second letter. For example, "he departed yesterday" becomes "he de pa rt ed ye st er da yx." Second, the plain text is encrypted using a 5 x 5 table containing a key word or phrase and some relatively simple rules that encrypt the plain text with 676 possible variations per digraph, versus 25 for each letter with Caesar type ciphers. The resulting cipher text will look something like "DA EA RD SA AE WT YG AQ ET ZY."

One obvious weakness of Playfair is the fact that a digraph and its reverse will encrypt with the same pattern. From the example, you can see that "departed" has a reverse digraph, "DE" and "ED." In the cipher text they can be easily found as "EA" and "AE." Knowing that "ED" is one of the 10 most common digraphs in English you might be able to decipher "EA RD SA AE" by replacing the reverse digraphs to get "DE RD SA ED."

So, while Ben Gates was racking his brain to figure out what debt that all men pay, his unfunny sidekick Riley Poole could have easily enhanced his computer program to discover the key or simply figured it out by hand. The small amount of cipher text may have complicated his analysis, but there are only so many word combinations and digraphs that could have produced "ME IK QO TX CQ TE ZY."