CS 662 Assignment 7: Planning and Decision Trees
Assigned: Thursday, October 19.
Due: Thursday November 7 at the start of class.
60 points total.
What to turn in: Written answers for questions 1 and 2. For question
3, hard copy of your source code.
Also, please put a copy of your code in the submit directory for this
class: /home/submit/cs662/(yourname)/assignment6. Everything necessary
to run your code should be in this directory. If anything out of the
ordinary is needed to run your code, please provide a README.
Question 1: Planning (20 Points)
Monkey and bananas is a classic AI toy planning problem. A
monkey is at the doorway of a room. Suspended from the ceiling in the
center of the room is a bunch of bananas. There is a box in the
corner of the room. The monkey's goal is to get the bananas. To do
this, it will need to push the box underneath the bananas and climb
up on the box. Our domain has three objects - the monkey, the box,
and the bananas.
We will use the following predicates:
Our Monkey agent will have the following actions:
- has(X) - the monkey is holding object X.
- at(object, loc) - the object (either the monkey or the box) is
at a particular location (valid locations are Corner, Center, Door).
- on(object1, object2) - object1 is on top of object2.
We will use the STRIPS representation and partial-order planning to
solve the problem. This is a pencil-and-paper exercise.
- move(oldLoc, newLoc) - this will move the monkey from oldLoc to
newLoc. In order to do this, the monkey must be in oldLoc.
- grab(object) - the monkey will grab the object. In order to grab
the bananas, the monkey must be on top of the box, and the box must
be in the center of the room.
- pushBox(oldloc, newloc) - moves the box from oldloc to
newloc. In order to perform this action, the monkey and the box must
both be in location oldloc. Afterwards, both are in location
- climb(object) - the monkey climbs onto the box. In order to do
this, the monkey must be in the same location as the box. After
performing this action, the monkey is on top of the box.
a) (5 points) Write STRIPS representations of the initial and
goal states using the predicates above.
b) (5 points) Write STRIPS representations for each of the
actions shown above.
c) (10 points) Trace the execution of the partial-order
planning (POP) algorithm on this problem. For each step, show the
list of open preconditions, the partial plan, and any ordering or
Question 2: Decision trees (by hand) (10 points):
Complete the PlayTennis example we started in class by hand. For each node,
show the entropy/information in the data set and the potential gain
for each possible attribute. Also, show the final tree. The data set
is included below.
| Day|| Outlook || Temperature ||
Humidity || Wind || PlayTennis |
|D1 || Sunny || Hot || High || Weak
|| No |
|D2 || Sunny || Hot || High ||
Strong || No |
|D3 || Overcast || Hot || High || Weak || Yes |
|D4 || Rain || Mild || High || Weak || Yes |
|D5 || Rain || Cool || Normal || Weak || Yes |
|D6 || Rain || Cool || Normal || Strong || No |
|D7 || Overcast || Cool || Normal || Strong || Yes |
| D8 || Sunny || Mild || High || Weak || No |
| D9 || Sunny || Cool || Normal || Weak || Yes |
| D10 || Rain || Mild || Normal || Weak || Yes |
| D11 || Sunny || Mild || Normal || Strong || Yes |
| D12 || Overcast || Mild || High || Strong || Yes |
| D13 || Overcast || Hot || Normal || Weak || Yes |
| D14 || Rain || Mild || High || Strong || No |
Part 3: Decision Trees (coding)(30 points)
In this part, you'll write Python code to construct a decision tree
from a data file. I've included some skeleton code for you to use as a
Your code should be able to run from the command line, and either a)
read in a training set, or b) read in a test set. for example:
python dt.py -train zoo
should read in the files zoo.csv (containing the data) and zoo.txt
(containing the labels for each attribute), construct a decision tree
representing the data, and write the decision tree out to a file
called zoo.pickle (using pickle).
python dt.py -test zoo should read in the tree stored in zoo.pickle
and use the test data in zoo.test to determine the accuracy of your
tree. It should print out a result indicating the fraction of test
cases correctly classified.
Your decision tree program should be able to work on any dataset
(don't hardcode in attributes or values).
In particular, it should be able to run on the following datasets:
- restaurant data:
- zoo data:
- credit screening data
(for testing purposes, you might also want to work with the tennis
example you solved by hand above. Here
is the metadata, and here is the data.)
If you are interested in trying your tree on other datasets, take a
look at the UC-
Irvine Machine Learning repository.
The restuarant dataset has two classes (WillWait and
WillNotWait). Many of the attributes have three or more
values, for example, restaurant type. The goal is to build a tree that
will tell us whether or not to wait for a restaurant on a
The zoo dataset has one unique trait (animal name), 15 boolean traits
(encoded as 0/1), and two integer-valued traits (numberOfLegs and
type). For type, 1=mammal, 2=bird,3=reptile,4=fish,5=amphibian,
6=insect, 7=crustacean. You may ignore the 'animalName'
trait. The task is to determine the animal's type (fish, mammal,
etc) from its other attributes.
The credit screening dataset is actual anonymized credit screening
data. The task is to determine whether an individual should or should
not be approved for a credit card. There are two things that make this
You are welcome to change the formats of the data files to make them
more readable (for example, changing 0 to False and 1 to
True). Depending on how you deal with the continuous-valued data in
the credit card problem, you might also want to change crx.txt. If you
do this, please include data files that work with your code in your
- Some values are continuous. You must decide on a scheme for
discretizing these values. You might choose to do this by
preprocessing the data, for example.
- Some attributes have missing values. You must decide how to
best deal with this problem. You may add these values to the
dataset; instead, your program must be robust enough to deal with
Building your decision tree
This file contains python code representing a
TreeNode object, plus stubs for most of the functions you should need to build
The code assumes that the dataset itself is stored as a list of lists,
and that the metadata is a list of tuples. I've provided functions
that will create these data structures from input files.
You will find that using list comprehensions is a particularly
effective programming style for this sort of program. For example, to
extract the 3rd column from the list-of-lists dataset for all rows in
which the last column contains a specific value, you can do:
[item for item in dataset if item[-1] == val]
Constructing the training and test sets
In order to evaluate the effectiveness of your decision tree, you will
need to test it on data that was not used to construct the tree. This
will require you to construct a training set and a test
For this homework, we will train on 80% of the data, and test on
20%. You should build a separate Python program that can randomly
separate a data set into training and test sets. (Note - be sure this
separation is random; don't just take the first 80% of the lines in
We will use this to perform n-fold cross-validation, where
n=5. In other words, repeat this five times and average the
- Create a random training and test set.
- Use the training set to construct a decision tree.
- Measure the performance of the tree on the test set; what
percentage of the test set was correctly classified? This is the
What is the average accuracy of your tree on the
restaurant, zoo and credit datasets?