CS 662 Assignment 7: Planning and Decision Trees
Assigned: Tuesday, October 23
Due: Thursday November 1 at the start of class.
30 points total.
What to turn in: Written answers for questions 1 and 2. For question
3, hard copy of your source code.
Also, please put a copy of your code in the submit directory for this
class: /home/submit/cs662/(yourname)/assignment6. Everything necessary
to run your code should be in this directory. If anything out of the
ordinary is needed to run your code, please provide a README.
Question 1: Planning (10 Points)
We want to write a STRIPS planner to help students figure out how to
organize their lives. It will tell them what order to get ready for
school, do their homework, and go to class.
We will use the following predicates:
- wearing(X) - the student is weaing object X
- at(loc) - the student is at location loc
- cleanTeeth - the student's teeth are clean
- hasEaten - the student has eaten.
- completed(X) - task X is completed
- attendingClass - the student is attending class.
Our student will have the following actions:
- move(oldLoc, newLoc) - this will move the student from oldLoc to
newLoc. In order to do this, the student must be in
oldLoc. Afterward, he/she is in newLoc, and not in oldLoc.
- getDressed. This will cause the student to be wearing
clothing. In order to do this, the student must be wearing
pajamas. Afterward, they are wearing clothing, and not wearing
pajamas.
- BrushTeeth. This will cause the student's teeth to be clean. To
do this, he/she must be at home.
- EatBreakfast. This will cause the student to have eaten, and
cause their teeth to be dirty. In order to eat breakfast, the
student must be at home.
- DoHomework. This will cause the student's homework to be
completed.
- AttendClass. This will cause the student to go to their class. In
order to do this, the student must be at school.
Our starting state is:
Start: at(Home), !hasEaten, !completed(Homework), wearing(Pajamas)
Our goal state is:
Goal: attendingClass, wearing(Clothing), CleanTeeth, hasEaten, completed(Homework)
We will use the STRIPS representation and partial-order planning to
solve the problem. This is a pencil-and-paper exercise.
a) (4 points) Write STRIPS representations for each of the
actions shown above.
b) (6 points) Trace the execution of the partial-order
planning (POP) algorithm on this problem. For each step, show the
list of open preconditions, the partial plan, and any ordering or
causal constraints.
Question 2: Decision trees (by hand) (5 points):
Complete the PlayTennis example we started in class by hand. For each node,
show the entropy/information in the data set and the potential gain
for each possible attribute. Also, show the final tree. The data set
is included below.
| Day | Outlook | Temperature |
Humidity | Wind | PlayTennis |
| D1 | Sunny | Hot | High | Weak
| No |
| D2 | Sunny | Hot | High |
Strong | No |
| D3 | Overcast | Hot | High | Weak | Yes |
| D4 | Rain | Mild | High | Weak | Yes |
| D5 | Rain | Cool | Normal | Weak | Yes |
| D6 | Rain | Cool | Normal | Strong | No |
| D7 | Overcast | Cool | Normal | Strong | Yes |
| D8 | Sunny | Mild | High | Weak | No |
| D9 | Sunny | Cool | Normal | Weak | Yes |
| D10 | Rain | Mild | Normal | Weak | Yes |
| D11 | Sunny | Mild | Normal | Strong | Yes |
| D12 | Overcast | Mild | High | Strong | Yes |
| D13 | Overcast | Hot | Normal | Weak | Yes |
| D14 | Rain | Mild | High | Strong | No |
Part 3: Decision Trees (coding)(15 points)
In this part, you'll write Python code to construct a decision tree
from a data file. I've included some skeleton code for you to use as a
template.
Your code should be able to run from the command line, and either a)
read in a training set, or b) read in a test set. for example:
python dt.py -train restaurant
should read in the files restaurant.csv (containing the data) and restaurant.txt
(containing the labels for each attribute), construct a decision tree
representing the data, and write the decision tree out to a file
called restaurant.pickle (using cPickle).
python dt.py -test restaurant should read in the tree stored in restaurant.pickle
and use the test data in restaurant.test to determine the accuracy of your
tree. It should print out a result indicating the fraction of test
cases correctly classified.
Your decision tree program should be able to work on any dataset
(don't hardcode in attributes or values).
In particular, it should be able to run on the following datasets:
- restaurant data:
- mushroom data:
(You might also want to test it on the tennis dataset from question 2,
since you know what that tree should look like.)
If you are interested in trying your tree on other datasets, take a
look at the UC-
Irvine Machine Learning repository.
The restuarant dataset has two classes (WillWait and
WillNotWait). Many of the attributes have three or more
values, for example, restaurant type. The goal is to build a tree that
will tell us whether or not to wait for a restaurant on a
particular evening.
The Mushroom dataset has two classes (edible and poisonous). There are
22 attributes, some of which have two values and some of which have
more. In addition, trait 11 (stalk-root) has some missing values. You must
decide how to best deal with this problem. You may not edit the data
file to add in values; instead, your program must be robust enough to
deal with missing data. There are several possible solutions; ou might
select a default value, or the most common value, for example.
Building your decision tree
This file contains python code representing a
TreeNode object, plus stubs for most of the functions you should need to build
this program.
The code assumes that the dataset itself is stored as a list of lists,
and that the metadata is a list of tuples. I've provided functions
that will create these data structures from input files.
You will find that using list comprehensions is a particularly
effective programming style for this sort of program. For example, to
extract the 3rd column from the list-of-lists dataset for all rows in
which the last column contains a specific value, you can do:
[item[3] for item in dataset if item[-1] == val]
Constructing the training and test sets
In order to evaluate the effectiveness of your decision tree, you will
need to test it on data that was not used to construct the tree. This
will require you to construct a training set and a test
set.
For this homework, we will train on 80% of the data, and test on
20%. You should build a separate Python program that can randomly
separate a data set into training and test sets. (Note - be sure this
separation is random; don't just take the first 80% of the lines in
the file).
We will use this to perform n-fold cross-validation, where
n=5. In other words, repeat this five times and average the
results.
- Create a random training and test set.
- Use the training set to construct a decision tree.
- Measure the performance of the tree on the test set; what
percentage of the test set was correctly classified? This is the
tree's accuracy.
What is the average accuracy of your tree on the
restaurant and mushroom datasets?