$30
CSE341 – Programming Languages
Homework #1
Programming in Lisp (100 points): In this project, you will implement various functions for encoding
and decoding a sequence of words using a cipher alphabet.
Cipher Alphabet
A cipher alphabet is a one to one mapping to a plain text alphabet. An example of such mapping is
given below:
Plain Alphabet : a b c d e f g h i j k l m n o p q r s t u v w x y z
Cipher Alphabet: d e f p q a b k l c r s t g y z h i j m n o u v w x
Encoding and decoding is simply done by replacing the letter in the source alphabet with the
corresponding one in the target alphabet using the provided mapping. For example:
Original: testing this
Encoded : mqjmlgb nklj
Project Description
For this project, words are represented as lists of lower case symbols, e.g., the word “class" is
represented as '(c l a s s ). Paragraphs are lists of words, e.g.,'((c l a s s) (i s) (r e a d y) (f o r) (w o r k)).
A document is a list of paragraphs.
A document encoded with a cipher alphabet is not easy to break without the help of a computer.
You are asked to implement two different methods to break the cipher,
1. a brute force version that uses a spell checker, and
2. a version that uses knowledge about the distribution of particular letters in the English
language (frequency analysis).
You will need to write two functions Gen-Decoder-A and Gen-Decoder-B that take as input a
paragraph of encoded words, and return as output a function that takes as input the encoded
document, and returns the plain text of the decoded document. The function Code-Breaker takes
an encoded document and a decoding function as input, and returns the entire document in plain text.
The two decoder functions implement different strategies to break the encrypted code. Both have
advantages and disadvantages. Your Lisp program will provide an infrastructure to assess the
effectiveness of these decoding approaches for different document types.
Brute Force with Spell Checker Gen-Decoder-A
The algorithm for the brute force code breaker is simple. The input words are encoded and all of them
are available in the provided dictionary. For each decoded word, you will assume that the word is
present in the dictionary. Then you’ll try to match every decoded word to dictionary words to find the
cipher’s mapping. For this method, you need a dictionary which contains each possible words. This
dictionary is provided in dictionary.cl. You may also define your own small dictionary for quick testing.
This will significantly reduce your runtime, an example test dictionary is available in test-dictionary.cl
file.
Frequency Analysis Gen-Decoder-B
This code breaker is based on the fact, that in the English language some letters occur more often than
others. Knowing the distribution of the different letters, this method just counts the number of
occurrences of each letter in the encoded words. If there are enough words to be statistically relevant,
the letter distribution can be used to identify the most common letters in English, namely 'e', 't', 'a',
‘o’, ‘i’ and ‘n’. In other words, the assumption is that the most frequent encoded letter has to
correspond to the letter 'e', etc. This means that one character in the mapping is decided. One can
continue on the second and third letters similarly. You are asked to implement two types of this
decoder.
• Gen-Decoder-B-0: Assume that the most frequent six letters are (in the correct order) 'e',
't', 'a', ‘o’, ‘i’ and ‘n’. After doing the frequency analysis for these six letters and establishing
the mapping, you are asked to determine the rest of the letters using a brute force method.
• Gen-Decoder-B-1: Implements a faster strategy. It assumes that the first six most frequent
letters are the same as above. But this thime, we decide on the other letters by looking at the
co-occurance of letters. For example, “is” occurs much more frequent than “ig”.
Implementation
For the implementation of these functions, you can use standard built-in Lisp functions such as append.
You must not use any global variables.
How To Get Started
Copy the files decode.cl, include.cl, test-dictionary.cl, dictionary.cl and document.cl into your own
subdirectory. You will implement your code breaker in file decode.cl. File test-dictionary.cl contains a
small dictionary consisting of only six words. Use it to debug your program. File dictionary.cl contains
a list of over 45,000 words allowing you to generate a realistic spell checker. You must not change this
file. Finally, file document.cl contains two sample documents to test your encoder, decoder, etc. You
should use your own test cases as well.