[Haskell] Caesar cipher cracker using frequency analysis

Questions about programming languages and debugging
Post Reply
User avatar
IceDane
Fame ! Where are the chicks?!
Fame ! Where are the chicks?!
Posts: 197
Joined: 12 Aug 2009, 16:00
14

[Haskell] Caesar cipher cracker using frequency analysis

Post by IceDane »

I just started learning haskell. This is probably the first real thing I've done. The comments should explain most of it.

Code: Select all

import Data.List
import Data.Char

-- Frequency table for the english language.
-- The first element is the the expected percentage of As used 
-- in an english text. The most common letter is e, at 12.7 percent.
table :: [Float]
table = [8.2, 1.5, 2.8, 4.3, 12.7, 2.2, 2.0, 6.1, 7.0, 0.2, 0.8, 4.0, 2.4,  6.7, 7.5, 2.0, 0.1, 6.0, 6.3, 9.1, 2.8, 1.0, 2.4, 0.2, 2.0, 0.1]

-- Shifts a character to the right if positive, left if negative. Wraps around.
shift :: Int -> Char -> Char -- Modulus handles the wraparound(shift 1 'z' = 'a')
shift n c | isUpper c = chr $ ord 'A' + ((ord c + n - ord 'A') `mod` 26)
          | isLower c = chr $ ord 'a' + ((ord c + n - ord 'a') `mod` 26)
          | otherwise = c        

-- Counts the number of times an element occurs in a list
count :: (Eq a) => a -> [a] -> Int
count e []     = 0
count e (x:xs) | e == x = 1 + count e xs
               | otherwise = count e xs 

-- Creates a table of frequencies for every letter in a particular string
freqtable :: String -> [Float]
freqtable [] = []
freqtable l  = [(fromIntegral . count c $ map toLower l) / (fromIntegral . length . filter (not . isSpace) $ l) * 100 | c <- ['a' .. 'z']] 

-- Encodes the message, shifting it n to the right(or n to the left is n is negative)
encode :: Int -> String -> String
encode _  [] = []
encode n  xs = map (shift n) xs

{- Chi Squaring function - calculates the chi square on a list of floats
-- It is defined as the sum of the squared differences of two 'adjacent' elements
-- over the the second element. 
-- It is used to determine which distribution of characters is closest to the 
-- distribution in the table above.-} 
chisqr :: [Float] -> [Float] -> Float
chisqr []     _      = 0
chisqr _      []     = 0
chisqr (o:os) (e:es) = ((o-e)^2)/e + chisqr os es

-- Rotates a list to the left by i places.
-- rotate 1 "123" becomes "231"
rotate :: Int -> [a] -> [a] 
rotate _ [] = []
rotate i xs = drop i xs ++ take i xs

-- Finds the first position of an element in a list.
position :: (Eq a) => a -> [a] -> Int
position e (x:xs) | x == e = 0
        | otherwise = 1 + position e xs

{- Attempts to determine the shift factor used to encrypt
-- the text passed to it.
-- To do this, it first gets the character frequencies of the encrypted
-- text. It then gets the chi square value of it, and all subsequent rotations
-- from 0 - 25. The rotation that has the smallest chi square value is most 
-- likely to be the shift factor for the plaintext. 
-- The longer the plaintext, the smaller the minimum chi square value will be.
-- This als implies the opposite - the smaller the plaintext, the lower the chance
-- of being able to crack the ciphertext is.
-}
crack [] = []
crack xs = encode (-minIndex) xs 
         where frequencies = freqtable xs 
               step a (b:bs)  = rotate a frequencies : b : bs
               step a []      = [rotate a frequencies]    
          chiVals        = map (\x -> chisqr x table) $ foldr step [] [0..25]
               minIndex       = position (minimum chiVals) $ chiVals

User avatar
leetnigga
Fame ! Where are the chicks?!
Fame ! Where are the chicks?!
Posts: 447
Joined: 28 Jul 2009, 16:00
14

Post by leetnigga »

Nice work there, IceDane.

There are a few things I would do differently.

I would write shift like this:

Code: Select all

-- Shifts a character to the right if positive, left if negative. Wraps around.
shift :: Int -> Char -> Char
shift n c = if isAlpha c then
                let a = if isLower c then 'a'
                        else 'A' in
                chr $ ord a + (ord c + n - ord a) `mod` 26 -- Modulus handles the wraparound (shift 1 'z' = 'a')
            else c
in order to avoid duplication, and to put your comment in the right place.

I would write count like this:

Code: Select all

-- Counts the number of times an element occurs in a list
count :: Char -> String -> Int
count c = length . filter (==c)
but I'm not sure which version would be more efficient.

I would write the frequency table generator function more like this:

Code: Select all

-- Creates a table of frequencies for every letter in a particular string
freqtable :: String -> [Float]
freqtable [] = []
freqtable l  = [(fromIntegral . count c $ alpha) / (fromIntegral numAlpha) * 100 | c <- ['a' .. 'z']]
               where alpha = map toLower . filter isAlpha $ l
                     numAlpha = length alpha
so that it's more obvious you're calculating the quotient of the current character count and the total character count. I replaced your (not.isspace) by isAlpha to count only alphabetic characters.

I would write encode like this:

Code: Select all

-- Encodes the message, shifting it n to the right(or n to the left is n is negative)
encode :: Int -> String -> String
encode n = map (shift n)
because map already takes care of empty lists, and this version is in a more point-free style.

I would write the Chi squaring function this way:

Code: Select all

{- Chi Squaring function - calculates the chi square on a list of floats
-- It is defined as the sum of the squared differences of two 'adjacent' elements
-- over the the second element.
-- It is used to determine which distribution of characters is closest to the
-- distribution in the table above.-}
chisqr :: [Float] -> [Float] -> Float
chisqr xs ys = sum $ zipWith (\a b -> ((a-b)^2)/b) xs ys
to get more declarative code that is closer to the mathematical definition.

:)

It's great to see some more Haskell on here.

User avatar
IceDane
Fame ! Where are the chicks?!
Fame ! Where are the chicks?!
Posts: 197
Joined: 12 Aug 2009, 16:00
14

Post by IceDane »

leetnigga wrote:Nice work there, IceDane.

There are a few things I would do differently.

I would write shift like this:

Code: Select all

-- Shifts a character to the right if positive, left if negative. Wraps around.
shift :: Int -> Char -> Char
shift n c = if isAlpha c then
                let a = if isLower c then 'a'
                        else 'A' in
                chr $ ord a + (ord c + n - ord a) `mod` 26 -- Modulus handles the wraparound (shift 1 'z' = 'a')
            else c
in order to avoid duplication, and to put your comment in the right place.

I would write count like this:

Code: Select all

-- Counts the number of times an element occurs in a list
count :: Char -> String -> Int
count c = length . filter (==c)
but I'm not sure which version would be more efficient.

I would write the frequency table generator function more like this:

Code: Select all

-- Creates a table of frequencies for every letter in a particular string
freqtable :: String -> [Float]
freqtable [] = []
freqtable l  = [(fromIntegral . count c $ alpha) / (fromIntegral numAlpha) * 100 | c <- ['a' .. 'z']]
               where alpha = map toLower . filter isAlpha $ l
                     numAlpha = length alpha
so that it's more obvious you're calculating the quotient of the current character count and the total character count. I replaced your (not.isspace) by isAlpha to count only alphabetic characters.

I would write encode like this:

Code: Select all

-- Encodes the message, shifting it n to the right(or n to the left is n is negative)
encode :: Int -> String -> String
encode n = map (shift n)
because map already takes care of empty lists, and this version is in a more point-free style.

I would write the Chi squaring function this way:

Code: Select all

{- Chi Squaring function - calculates the chi square on a list of floats
-- It is defined as the sum of the squared differences of two 'adjacent' elements
-- over the the second element.
-- It is used to determine which distribution of characters is closest to the
-- distribution in the table above.-}
chisqr :: [Float] -> [Float] -> Float
chisqr xs ys = sum $ zipWith (\a b -> ((a-b)^2)/b) xs ys
to get more declarative code that is closer to the mathematical definition.

:)

It's great to see some more Haskell on here.
Thanks. As you can see, the language is still so new to me that I failed to notice those repeating patterns that many of the built-in functions take care of, like the filter for counting the the chars and such.

I guess it'll come with time. Thanks again.

User avatar
leetnigga
Fame ! Where are the chicks?!
Fame ! Where are the chicks?!
Posts: 447
Joined: 28 Jul 2009, 16:00
14

Post by leetnigga »

IceDane wrote:Thanks. As you can see, the language is still so new to me that I failed to notice those repeating patterns that many of the built-in functions take care of, like the filter for counting the the chars and such.

I guess it'll come with time. Thanks again.
You're welcome. Good luck with your Haskell.

Post Reply