# CS 662 Assignment 1: Math Review and Basic Python

Assigned: Tuesday, August 28.
Due: Tuesday, September 4 at the start of class.
30 points

What to turn in: Please create a directory called 'assignment1' in your subversion repository and place all your source code in there before the beginning of class. Also, please bring a hard copy of your code and the answers to the math questions, with your name on it, to class. If you have messy handwriting I would strongly recommend typing this.

This assignment is meant to help you review the mathematics we'll need in this class and to get a basic familiarity with Python. You should use the references on the course homepage to help you learn Python; if you're familiar with a high-level language such as Java or Perl, much of Python will look pretty familiar.

You will be expected to use good software engineering principles in this assignment (as in all assignments). This includes:
• Properly documented code
• Well-chosen variable names
• Modular code
• Code reuse when appropriate
• Appropriate error and exception handling and input checking
You will be graded according to both style and functionality.

Also, please read the assignments carefully before starting and make sure that you a) do the tasks that are requested and b) complete all of the requirements of the assignment.

Math Review: Please complete each of the following problems on a separate page, showing your work where appropriate.
1. Running time. (5 points) For each of the following operations, give the time required to perform this operation in big-O terms (e.g. O(n), O(n^2), etc)
• Insertion into a hash table
• Looking up an element in a hash table
• Finding the largest element in an unsorted linked list
• Sorting an array with mergesort
• Finding all elements in the intersection of two sets, assuming the sets are stored as linked lists.
2. Probability. (5 points) Please show your work for each of the following questions.
Assume we are playing a slot machine. We know that, each time we pull the lever, the machine has probability n of paying us \$1, and a 1-n probability of paying us \$0. (0 <= n <= 1). Each pull of the lever is independent - that is, your chance of winning at time i does not depend on whether you have won in the past. It costs nothing to play.
• What is the probability of getting the following sequence: win, win, lose, win, lose.
• What is the probability of playing five times and winning exactly once?
• What is the probability of playing five times and winning at least once?
• If I play the machine 10 times, how much can I expect to win?
3. Trees. (5 points) Assume we have a binary tree with five levels.
• How many leaves does this tree have?
• How many total nodes does this tree have?
• Now let's generalize: write a summation that computes the total number of nodes in the tree in terms of the number of levels (n).

Python programming: For each of the following problems, write a Python function to solve the problem. You should also provide a 'main' function that allows your code to be run from the command line.

Problem 1: Palindrome checker (5 points) Write a python program that takes an input string as a command line argument (do NOT prompt the user for input) and determines whether the string is a palindrome, which is a word that reads the same backwards as forwards. You may ignore case and non-alphanumeric characters such as punctuation. Make your code as efficient as possible.
Sample usage:
```
brooks\$ python palchecker.py "This is not a palindrome"
No

brooks\$ palchecker.py "Able was I ere I saw Elba"
Yes
```
Problem 2: Tag stripper (5 points)

Write a python program that can take as input a URL, fetch that web page, and strip out all of the HTML tags. (For purposes of this assignment, anything between < and > is a tag) You should implement two methods for doing this: One in which you scan the document 'by hand' and keep track of opening and closing brackets, and one in which you use regular expressions (imported in the re module) to replace tags with empty strings. Your program should also have a command-line argument that allows the user to specify which method is used. (Have by hand be the default)

Hints:
• Use the urllib module to fetch the page.
• The getopt module will be useful in working with command line arguments.
• You should have a main, plus two functions for extracting tags.
• When extracting tags with the re module, there's one wrinkle in dealing with Javascript. By default, the '.' character does not match newline. To get around this, compile a regex first with the re.DOTALL option. (see the Python documentation or Dive Into Python for more details.)
• My solution is about 35 lines. Over 2/3 of this is in main - the functions for extracting tags are very short.
Sample usage:
```brooks\$ python tagstripper.py -r http://www.google.com

Web    Images    VideoNew!    News    Maps    more »

Web    Images    VideoNew!    News    Maps    more »

Usage: tagstripper.py {-r|-h} URL
```
Problem 3: Word counter (5 points)

Write a Python program that can take a string as input from the command line and create a dictionary that maps each word in the string to the number of times it occurs in the string. Within the program, you should have a method that creates the dictionary, and then nicely display the dictionary in your main method. You should be able to combine this with the program from problem 2 to count the words in a web page.

Some hints:
• For this problem, don't worry about case ('Cat' and 'cat' are different) or stripping off punctuation marks.
• My solution, including error checking, is 20 lines long.
Sample Usage:
```brooks\$ python wordcounter.py 'cat cat dog Cat dog'
Cat : 1
dog : 2
cat : 2
brooks\$ python tagStripper.py http://www.cs.usfca.edu | python ./wordcounter.py
all : 1
usfca : 1
particularly : 1
:) : 1
office : 1
demand : 1
developed : 1
Night : 2

(etc)
```