CS 662 Assignment 1: Math Review and Basic Python
Assigned: Tuesday, August 28.
Due: Tuesday, September 4 at the start of class.
30 points
What to turn in: Please create a directory called 'assignment1' in your
subversion repository and place all your source code in there before the
beginning of class. Also, please bring a hard copy of your code and
the answers to the math questions, with your name on it, to class.
If you have messy handwriting I would strongly recommend
typing this.
This assignment is meant to help you review the mathematics we'll need
in this class and to get a basic familiarity with Python. You should
use the references on the course homepage to help you learn Python;
if you're familiar with a high-level language such
as Java or Perl, much of Python will look pretty familiar.
You will be expected to use good software engineering principles in
this assignment (as in all assignments). This includes:
- Properly documented code
- Well-chosen variable names
- Modular code
- Code reuse when appropriate
- Appropriate error and exception handling and input checking
You will be graded according to both style and functionality.
Also, please read the assignments carefully before starting and make
sure that you a) do the tasks that are requested and b) complete all
of the requirements of the assignment.
Math Review: Please complete each of the following problems on a
separate page, showing your work where appropriate.
- Running time. (5 points) For each of the following
operations, give the time required to perform this operation in
big-O terms (e.g. O(n), O(n^2), etc)
- Insertion into a hash table
- Looking up an element in a hash table
- Finding the largest element in an unsorted linked list
- Sorting an array with mergesort
- Finding all elements in the intersection of two sets, assuming
the sets are stored as linked lists.
- Probability. (5 points) Please show your work for each
of the following questions.
Assume we are playing a slot machine. We know that, each time we pull
the lever, the machine has probability n of paying us $1, and a 1-n
probability of paying us $0. (0 <= n <= 1). Each pull of the lever
is independent - that is, your chance of winning at time i does not
depend on whether you have won in the past. It costs nothing to play.
- What is the probability of getting the following sequence: win,
win, lose, win, lose.
- What is the probability of playing five times and winning exactly
once?
- What is the probability of playing five times and winning at
least once?
- If I play the machine 10 times, how much can I expect to win?
- Trees. (5 points) Assume we have a binary tree with five
levels.
- How many leaves does this tree have?
- How many total nodes does this tree have?
- Now let's generalize: write a summation that computes the total
number of nodes in the tree in terms of the number of levels (n).
Python programming: For each of the following problems, write a Python
function to solve the problem. You should also provide a 'main'
function that allows your code to be run from the command line.
Problem 1: Palindrome checker (5 points)
Write a python program that takes an input string as a command line
argument (do NOT prompt the user for input) and determines whether the
string is a palindrome, which is a word that reads the same backwards
as forwards. You may ignore case and non-alphanumeric characters
such as punctuation. Make your code as efficient as possible.
Sample usage:
brooks$ python palchecker.py "This is not a palindrome"
No
brooks$ palchecker.py "Able was I ere I saw Elba"
Yes
Problem 2: Tag stripper (5 points)
Write a python program that can take as input a URL, fetch that web
page, and strip out all of the HTML tags. (For purposes of this
assignment, anything between < and > is a tag) You should
implement two methods for doing this: One in which you scan the
document 'by hand' and keep track of opening and closing brackets, and
one in which you use regular expressions (imported in the re module)
to replace tags with empty strings. Your program should also have a
command-line argument that allows the user to specify which method is
used. (Have by hand be the default)
Hints:
- Use the urllib module to fetch the page.
- The getopt module will be useful in working with command
line arguments.
- You should have a main, plus two functions for extracting
tags.
- When extracting tags with the re module, there's one wrinkle in
dealing with Javascript. By default, the '.' character does not match
newline. To get around this, compile a regex first with the re.DOTALL
option. (see the Python documentation or Dive Into Python for more
details.)
- My solution is about 35 lines. Over 2/3 of this is in main - the
functions for extracting tags are very short.
Sample usage:
brooks$ python tagstripper.py -r http://www.google.com
Google
Personalized Home | Sign in
Web Images VideoNew! News Maps more »
BooksFroogleGroupseven more » Advanced Search Preferences Language ToolsAdvertising Programs - Business Solutions - About Google©2006 Google
brooks$ python tagstripper.py http://www.google.com
Google
Personalized Home | Sign in
Web Images VideoNew! News Maps more »
BooksFroogleGroupseven more » Advanced Search Preferences Language ToolsAdvertising Programs - Business Solutions - About Google©2006 Google
brooks$ python tagstripper.py -q http://www.google.com
Usage: tagstripper.py {-r|-h} URL
Problem 3: Word counter (5 points)
Write a Python program that can take a string as input from
the command line and create a dictionary that maps each word in the
string to the number of times it occurs in the string. Within the
program, you should have a method that creates the dictionary, and
then nicely display the dictionary in your main method. You should be able to
combine this with the program from problem 2 to count the words in a
web page.
Some hints:
- To read from the output of another command, use sys.stdin.read()
- For this problem, don't worry about case ('Cat' and 'cat' are
different) or stripping off punctuation marks.
- My solution, including error checking, is 20 lines long.
Sample Usage:
brooks$ python wordcounter.py 'cat cat dog Cat dog'
Cat : 1
dog : 2
cat : 2
brooks$ python tagStripper.py http://www.cs.usfca.edu | python ./wordcounter.py
all : 1
usfca : 1
particularly : 1
:) : 1
office : 1
demand : 1
developed : 1
Night : 2
(etc)