CS662 Assignment 9, utility, perceptrons, and programming potpourri.

Assigned: Tuesday, November 27.
Due: Monday, December 10 at 9:00 am. No late assignments accepted.
40 points total.

To turn in: For problems 1, 2, and 3, typed or handwritten answers to your problems. For the coding problems, please submit all code to a folder named assignment9 in your subversion repository, and also submit a hard copy of your code.

Question 1. Utility. (5 points)
(from R & N, pp610): Tickets to a lottery cost $1. There are two possible prizes: a $10 payoff, with probability 1/50, and a $1,000,000 payoff with probability 1/2,000,000. What is the expected value of a lottery ticket? What is the optimal number of tickets to buy, assuming your utility for money is linear?

Question 2. Value of information. (10 points total)

Suppose that our route-finding agent is trying to suggest a route for us to get from USF to Oakland. We want to minimize the expected travel time. We know that, when the Bay Bridge is busy, it takes 1 hour to drive there, and when the Bay Bridge is not busy, it takes 30 minutes to drive there. We know that taking BART always takes 40 minutes. We also know that the Bay Bridge is busy 40% of the time.

a. (2 points) Without any other information, should we drive or take BART? Show all necessary work.

b. (2 points) Suppose that we can spend five minutes checking a traffic website to see if the bridge is actually busy. We know that 90% of the time when the bridge is actually busy, the site will say it's busy. (P(site | busy) = 0.9) We also know that 20% of the time the site will say the bridge is busy when it actually isn't. (P(site | !busy) = 0.2)

Use Bayes' rule to determine the probability that the bridge is actually busy if the site says it is. (P(busy | site)).

c. (3 points) If the site says the Bridge is busy, what should we do? What if the site says the bridge is not busy? Show all work.

d. (3 points) Use a value of information calculation to determine whether it is worth it for us to spend five minutes checking the traffic website.

Question 3. Perceptrons. (5 points)

Do 20.15a on pp 761 of Russell & Norvig. You may assume that alpha = 0.1 and w0 = 0.

Question 4. Value Iteration and Policy Iteration. (5 points each) For this problem, you will implement the value iteration and policy iteration algorithms. I've provided a representation for states, a map, and the setup for two problems - the one shown in R&N (and done in class), and a larger problem, the map of which can be found here. In this second problem, the agent moves in the intended direction with P=0.7, and in each of the other 3 directions with P=0.1. Your task is to implement the value iteration and policy iteration algorithms and verify that they work with both problems. (I'd suggest doing the R&N problem first.)

You may assume R=-0.04 for all non-goal states, and gamma = 0.8.

Here's an example of what the code looks like running in the Python interpreter:
>>> import mdp
>>> m = mdp.makeRNProblem()
>>> m.valueIteration()
>>> [(s.coords, s.utility) for s in m.states.values()]
[(0, 0), (1, 0.30052947656142465), (2, 0.47206207850545195), (3,
0.68209220953458682), (4, 0.18120337982169335), (5,
0.34406397771608599), (6, 0.09080843870176547), (7,
0.095490116585228102), (8, 0.18785929363720655), (9,
0.00024908649990546677), (10, 1.0), (11, -1.0)] 
>>> m.policyIteration()
>>> [(s.coords, s.utility, s.policy) for s in m.states.values()]
[(0, 0, None), (1, 0.28005761520403155, 'right'), (2,
0.4690814072745027, 'right'), (3, 0.68184632776188669, 'right'), (4,
0.15435343031111029, 'up'), (5, 0.34377291077136857, 'up'), (6,
0.061864822644220767, 'up'), (7, 0.088791721072110752, 'right'), (8,
0.18680600621029542, 'up'), (9, -0.00075615039456027738, 'left'), (10,
1.0, None), (11, -1.0, None)] 


There are four programming problems described below. You must do one of them, which is worth 10 points.

In addition, you may do up to two others, for up to 5 points extra credit each, to be applied directly to the score of your lowest midterm or final.