EECS 380 Homework 5

Winter Semester '01

Assigned Sunday April 1 2001. (Preview was available as of March 29th)
Due Tuesday, April 10th, 8pm, EECS 380 box in room 2420 EECS

Updates

APR 8: Included a turn-in list just to be clear.
APR 7: Added a couple of online references for Kruskal's algorithm
APR 7: Several new handouts are available for those who have difficulties with this HWK (binary read-write and containers of pointers are explained).
APR 4: Encoded messages mailed out. Discussion of reading binary files put in additional material on main page.
APR 3: Problem 3: What weights should be used on the edges of the graph is clarified.
APR 1: Problem 2 restricted. The characters allowed in the pass-code were restricted to make the runtime more reasonable. (about 1/4 of a million possible pass-codes rather than 16 Meg). Also as there are always 3 chars in the passcode that argument was eliminated.
APR 1: two typos corrected. Important to note that (c^x)^x== x was wrong. Should have been (c^x)^x== c.

BST-ordered arrays
Recall that in homework 4, you viewed an array of N elements as a balanced binary tree. Note that the balanced requirement was critical since arrays do not have holes in them.
Now, instead of counting heap-orders, you need to count BST-orders in the same context. Assume that N >2 and that all N elements are different.
1. If the array is sorted in a descending order, will that be a BST-order? (possible answers: never, sometimes yes and sometimes no, always) prove your answer.
2. Prove that there is always at least one BST-order. Do this by giving an algorithm to BST-order any given N elements.
3. Does the number of different BST-orders depend on the actual values of the N elements? (as long as all elements are different)
4. Count BST-orders on N different elements; possible approaches are
  - Analytical, e.g., find a closed-form expression or a reasonably-accurate estimate, then sanity-check it for small N and prove it.
  - Empirical, i.e., implement an algorithm that would estimate the number of or count all BST-orders for N elements.
  Note that each of the two approaches has advantages and drawbacks. If you come up with a closed-form expression, you can save a lot of time since no code needs to be written. If there is no closed-form expression, all attempts to find it can be futile. Also note that if the number of BST orders is very small, then randomized sampling will require a very large number of samples. If you decide to solve this question empirically, you must solve for all array sizes up to and including 15.
Brute-force password cracker (dictionary attack) using STL's hash_set
You have been asked, by the secret head of the NSA (one Igor "white hat" Markov) to break the encryption algorithm used by the evil villain Mark "nogoodnik" Brehob. Thankfully Brehob isn't the swiftest villain the NSA has ever faced -- he uses a really bad encryption scheme. Three encrypted messages from Brehob will be mailed out to you.
It is known that what Brehob uses is the XOR operation (^ in C++) on single characters. He takes his secret message and XORs it with an unknown pass-code consisting of 3 ASCII characters.
For example, encoding the plaintext "SECRET MESSAGE" with pass-code "BLA"
```
 SECRET MESSAGE
 BLABLABLABLABLA 
```
The encrypted message will consist of characters 'S'^'B', 'E'^'L', 'C'^'A' and so on.
The same pass-code and algorithm are used for encryption and decryption. This technique is based on the fact that for two chars c and x (c^x)^x== c is always true.
You are to find the unknown pass-code and decrypt a given message, using "the dictionary attack". It assumes that the secret message only uses ASCII characters and consists of space-separated words, each of which is found in a given dictionary. You can use isascii() (see man isascii) and can use a supplied dictionary (see below).
Algorithm: go over all possible pass-codes and "decrypt" the message using each proposed pass-code. Check the resulting string against the dictionary. If it consists entirely of words from the dictionary, you cracked the pass-code and decrypted the message. Otherwise, check the next pass-code. In order to speed up the dictionary check, you need to read the dictionary into memory and put all of it into an STL's hash_set (make sure that your workstation has enough memory).
The characters in the passcode will consist of only: lower case letters, upper case letters and numbers (0-9). No other ASCII characters will be in the pass-code.
Your program should take the following arguments:
- a dictionary file (one word per line); Unix systems typically have a couple of dictionaries in /usr/share/dict/ or /usr/dict/ (they are used by Unix spellchecker ispell).
- an encrypted message
Depending on the outcome, your program should print one of
- "Tried all possible pass-codes, didn't find a good one"
- "The secret message is: XYZ... The pass-code is: BLA"
Turn in:
1. A copy of your code. Please use enscript -2r and print it double-sided.
2. The three decrypted messages and associated pass-codes.
3. An explanation as to why hash_sets are useful in this project (instead of sets for example)
4. Describe how much time your program required to crack those three messages and how big the dictionary was. Further, estimate how long your program would have taken if Brehob had used 4,5 or 6 character pass-codes.
Note: Encryption by XOR is very very weak, and can be broken much more efficiently by smarter attacks. When you graduate and go to work (or grad school), do not use encryption by XOR when encryption strength is important.
You must use the described dictionary attack (brute force) for this project.
Note: Since encrypted messages are not in ASCII, you need to read/write them as binary (see handout 1 and handout 2).
Kruskal
References: example 1, example 2, theory
Implement Kruskal's algorithm. The only restrictions are that:
- You can't just copy C/C++ code from the Internet.
- You must use the STL for at least a part of your algorithm
You are to turn in
1. A MST for 100 randomly generated points in a plane. (Use your student ID number as the seed for the randomly generated points.) You may bound the plane in any reasonable way. The distance between the points is the weight of the edge between them. Thus you are dealing with a complete graph (all nodes have connections to all other nodes).
2. A printout of the program that generated the 100 points randomly.
3. A program printout for Kruskal's. If you use any code from someplace other than the text you must cite your source in your code.

HINT: Some of the above problems are really easy.

Turn-in list

As always this is just a summary. Details about what is desired is provided above. When handing-in code, please use enscript -2r and print it double-sided.

- Answers to each of the questions. Include any math, comments or code needed to justify your answers.
- A copy of your code.
- The three decrypted messages and associated pass-codes.
- An explanation of why hash_sets were useful.
- Time descriptions. Includes time to break the codes given and predictions about breaking longer codes.
- A copy of your code to find a MST.
- The MST itself. (Graphical representation would be nice, but is not required.)
- A copy of your code to generate 100 points.