Project 2: Distributed Password Cracker (v 1.1)

Starter repository on GitHub: https://classroom.github.com/a/R5RmBthj

For this project, you’ll play the role of a mastermind hacker that has infiltrated the NSA, FBI, and CIA (anti-heroes are popular these days, right?). You have successfully retrieved the password databases of these organizations, which consists of lines with (username, hash, type, length) tuples like the following:

aditya   f7ff9e8b7bb2e09b70935a5d785e0cc5d9d0abf0  1 5 
carmen   64369a22cbc5686e2ccf609aae16fe42fa1178b4  1 6 
erika    250e77f12a5ab6972a0895d290c4792f0a326ea8  1 6 
jose     1bb942cd56e273e6600ccfb36afd61d8cec25fb7  0 10
liu      a3a73b6dfa8f4caedd0349f676ae46b39bdb7fbd  1 4 
matthew  f3075914d59377798c53341d273f8f9ebecfc428  1 9 
niha     f4169f30903c1fca747cdcd7c2d0081a79e23514  2 5 
yuki     0bb356f2fe7ed172175d5a7f59617d40cd3b2dba  2 6

Since it is bad practice to store passwords as plain text in a database, it appears these entries have been hashed. Using your knowledge of cryptography, you determine that the hashes were generated by the SHA-1 algorithm. A hash function maps data of arbitrary size to fixed-size strings (hashes) – SHA-1 produces 40-character strings. You can see how the SHA-1 algorithm works with the following (run on department Linux machines):

[jet01:~]$ echo -n "hello" | sha1sum
aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d  -

In this case, hello maps to the hash string aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d.

Using Social Engineering, you determine that the password types are as follows:

0: Numeric passwords (0-9)
1: Characters (a-z, A-Z)
2: Alphanumeric (a-z, A-Z, and 0-9)

Given the password lengths and these parameters, we can design a program to brute-force the password hashes. Essentially, we’ll be doing hash inversions: generating a string, hashing it with the SHA-1 algorithm, and then checking to see if the resulting hash matched our target hash (you can use strcmp for this).

Unfortunately, brute-force password attacks are extremely compute-intensive; while modern computers are able to perform millions of hash inversions per second, there are so many combinations to check that it can take hours to recover a short password.

This is where parallelism comes in: we’ll use MPI to split up the work across multiple processes running on multiple machines.

In this assignment, you will get more familiar with:

MPI Communication
Dividing up workloads
Coordinating between processes
MPI_Reduce, MPI_Iprobe
Taking performance measurements

You are given a complete sequential program and must parallelize it using MPI. Here’s a demo run for the completed, parallel version of the program:

# Test run on 52 machines
[jet01:~]$ mpirun -n 52 -hostfile ./jets.txt \
    ./passwd 5 d0be2dc421be4fcd0172e5afceea3970e2f3d940 alpha
Starting parallel password cracker.
Number of processes: 52
Coordinator node: jet01
Valid characters: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ (52)
Target password length: 5
Target hash: D0BE2DC421BE4FCD0172E5AFCEEA3970E2F3D940
[50|1000000] YhfQN -> 7CE41B80E290031CE7C0C0F2DA7B515D83647A63
[51|1000000] ZhfQN -> DB6520D595373C5EBE6AA9CFB2B7AA984D10CE68
[49|1000000] XhfQN -> 596D5C8254CEDED0B39F50ECE2A38C28CD9E18B4

 ... (outputs hidden for brevity) ...

[29|2000000] DolHB -> 97B3A2C557DA8FFCD182BD6B762F175A1D8288E4
[44|2000000] SolHB -> 6BA8A384B810E3E148578253C8AE24CB726DE919
[30|2000000] EolHB -> 194EEFC3EADB51E14270B5AE98AF8BCA5A1E02FC
Operation complete!
Time elapsed: 11.72s
Total passwords hashed: 113067452 (9645103.93/s)
Recovered password: apple

# Test run, local machine
# Note: this fails because we passed in a hash for password 'hi',
# but are using the numeric set of valid characters.
[silicon:~]$ mpirun --oversubscribe -n 8 \
    ./passwd 5 d0be2dc421be4fcd0172e5afceea3970e2f3d940 numeric
Starting parallel password cracker.
Number of processes: 8
Coordinator node: silicon
Valid characters: 0123456789 (10)
Target password length: 5
Target hash: D0BE2DC421BE4FCD0172E5AFCEEA3970E2F3D940
Operation complete!
Time elapsed: 0.05s
Total passwords hashed: 80000 (1719355.80/s)
FAILED to recover password!

Implementation

Here are some guidelines for your implementation:

You will need to split the workload up across the number of processes available. For this assignment, it’s sufficient to give each process one or more of the valid password characters. For instance, Rank 0 might work on passwords starting with a and b, Rank 1 would handle c and d, and so on.
You should print the current status of your search every 1 million hash inversions. This way you can see the progress being made by your workers without printing too much information to the terminal. The format is: [rank|num-hashes]
Once a matching password has been found by a given rank, it should tell all the remaining processes to shut down. After all, it would be inefficient to let them keep running. You will use non-blocking MPI communication to accomplish this.
You’ll need to report how long the computation ran for, as well as the total number of password hashes computed. You can use Rank 0 to time the computation, along with a reduction operation to total up the final number of hashes.
Our SHA-1 library produces passwords in all caps, but you shouldn’t require users to enter the target hash in a particular case.
Users should be able to select valid password characters via an optional command line argument. If no argument is provided, you will assume the password is alpha-numeric.

Testing Your Code

You can generate test cases using the sha1sum command. If you’re a Mac user, you can install the coreutils package to get it (then run gsha1sum1 INSTEAD of just sha1sum). Otherwise, use a Linux machine. Let’s say you want to test your program with an easy password, such as ‘hi’. You know this password is two characters in length, so all you need to do is generate the SHA-1 hash for it:

echo -n 'hi' | sha1sum
c22b5f9178342609428d6f51b2c5af4c0bde6a42  -

Then input the resulting hash, c22b5f9178342609428d6f51b2c5af4c0bde6a42 into your program.

Grading

The grade breakdown for this assignment is:

4pts Splitting up the workload evenly across available processes
1pts Printing status information during the search
1pts Correct detection of matching passwords
4pts Informing other processes that a match was found
4pts Shutting down worker processes immediately after finding a match
1pts Correctly reporting when there were no matching passwords found
4pts Final statistics (timing, hashes/sec, etc)
1pts Search parameter printout (target hash, valid chars, etc).
3pts Function documentation and comments
2pts Code style (no commented out blocks of code, unused variables, inconsistent indentation)
2pts Proper command line argument handling and error checking.
3pts Performance questions (edit README.md provided in the starter repository).

Extra Credit: You can earn 1 extra credit point by allowing for more fine-grained parallelism: with the given assignment spec, parallelism is limited to the number of valid input characters (i.e., if there are 26 valid characters, the most processes you can have is 26). Lift this restriction to earn the extra credit.

Changelog

First version posted (3/22)
Added info about print format (4/2)