# CS 662 Assignment 7: Planning and Decision Trees

Assigned: Thursday, October 19.
Due: Thursday November 7 at the start of class.
60 points total.

What to turn in: Written answers for questions 1 and 2. For question 3, hard copy of your source code.

Also, please put a copy of your code in the submit directory for this class: /home/submit/cs662/(yourname)/assignment6. Everything necessary to run your code should be in this directory. If anything out of the ordinary is needed to run your code, please provide a README.

Question 1: Planning (20 Points)

Monkey and bananas is a classic AI toy planning problem. A monkey is at the doorway of a room. Suspended from the ceiling in the center of the room is a bunch of bananas. There is a box in the corner of the room. The monkey's goal is to get the bananas. To do this, it will need to push the box underneath the bananas and climb up on the box. Our domain has three objects - the monkey, the box, and the bananas.

We will use the following predicates:
• has(X) - the monkey is holding object X.
• at(object, loc) - the object (either the monkey or the box) is at a particular location (valid locations are Corner, Center, Door).
• on(object1, object2) - object1 is on top of object2.
Our Monkey agent will have the following actions:
• move(oldLoc, newLoc) - this will move the monkey from oldLoc to newLoc. In order to do this, the monkey must be in oldLoc.
• grab(object) - the monkey will grab the object. In order to grab the bananas, the monkey must be on top of the box, and the box must be in the center of the room.
• pushBox(oldloc, newloc) - moves the box from oldloc to newloc. In order to perform this action, the monkey and the box must both be in location oldloc. Afterwards, both are in location newloc.
• climb(object) - the monkey climbs onto the box. In order to do this, the monkey must be in the same location as the box. After performing this action, the monkey is on top of the box.
We will use the STRIPS representation and partial-order planning to solve the problem. This is a pencil-and-paper exercise.

a) (5 points) Write STRIPS representations of the initial and goal states using the predicates above.

b) (5 points) Write STRIPS representations for each of the actions shown above.

c) (10 points) Trace the execution of the partial-order planning (POP) algorithm on this problem. For each step, show the list of open preconditions, the partial plan, and any ordering or causal constraints.

Question 2: Decision trees (by hand) (10 points):

Complete the PlayTennis example we started in class by hand. For each node, show the entropy/information in the data set and the potential gain for each possible attribute. Also, show the final tree. The data set is included below.

 Day Outlook Temperature Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No

Part 3: Decision Trees (coding)(30 points)

In this part, you'll write Python code to construct a decision tree from a data file. I've included some skeleton code for you to use as a template.

Your code should be able to run from the command line, and either a) read in a training set, or b) read in a test set. for example:

python dt.py -train zoo

should read in the files zoo.csv (containing the data) and zoo.txt (containing the labels for each attribute), construct a decision tree representing the data, and write the decision tree out to a file called zoo.pickle (using pickle).

python dt.py -test zoo should read in the tree stored in zoo.pickle and use the test data in zoo.test to determine the accuracy of your tree. It should print out a result indicating the fraction of test cases correctly classified.

Your decision tree program should be able to work on any dataset (don't hardcode in attributes or values).

In particular, it should be able to run on the following datasets:
• restaurant data:
• zoo data:
• credit screening data

(for testing purposes, you might also want to work with the tennis example you solved by hand above. Here is the metadata, and here is the data.)

If you are interested in trying your tree on other datasets, take a look at the UC- Irvine Machine Learning repository.

The restuarant dataset has two classes (WillWait and WillNotWait). Many of the attributes have three or more values, for example, restaurant type. The goal is to build a tree that will tell us whether or not to wait for a restaurant on a particular evening.

The zoo dataset has one unique trait (animal name), 15 boolean traits (encoded as 0/1), and two integer-valued traits (numberOfLegs and type). For type, 1=mammal, 2=bird,3=reptile,4=fish,5=amphibian, 6=insect, 7=crustacean. You may ignore the 'animalName' trait. The task is to determine the animal's type (fish, mammal, etc) from its other attributes.

The credit screening dataset is actual anonymized credit screening data. The task is to determine whether an individual should or should not be approved for a credit card. There are two things that make this dataset interesting:
1. Some values are continuous. You must decide on a scheme for discretizing these values. You might choose to do this by preprocessing the data, for example.
2. Some attributes have missing values. You must decide how to best deal with this problem. You may add these values to the dataset; instead, your program must be robust enough to deal with missing data.
You are welcome to change the formats of the data files to make them more readable (for example, changing 0 to False and 1 to True). Depending on how you deal with the continuous-valued data in the credit card problem, you might also want to change crx.txt. If you do this, please include data files that work with your code in your submit directory.

Building your decision tree

This file contains python code representing a TreeNode object, plus stubs for most of the functions you should need to build this program.

The code assumes that the dataset itself is stored as a list of lists, and that the metadata is a list of tuples. I've provided functions that will create these data structures from input files.

You will find that using list comprehensions is a particularly effective programming style for this sort of program. For example, to extract the 3rd column from the list-of-lists dataset for all rows in which the last column contains a specific value, you can do:
```[item[3] for item in dataset if item[-1] == val]
```

Constructing the training and test sets

In order to evaluate the effectiveness of your decision tree, you will need to test it on data that was not used to construct the tree. This will require you to construct a training set and a test set.

For this homework, we will train on 80% of the data, and test on 20%. You should build a separate Python program that can randomly separate a data set into training and test sets. (Note - be sure this separation is random; don't just take the first 80% of the lines in the file).

We will use this to perform n-fold cross-validation, where n=5. In other words, repeat this five times and average the results.
1. Create a random training and test set.
2. Use the training set to construct a decision tree.
3. Measure the performance of the tree on the test set; what percentage of the test set was correctly classified? This is the tree's accuracy.

What is the average accuracy of your tree on the restaurant, zoo and credit datasets?