Part 3. Personalization. Due 4/7/03
What to turn in:
1. A URL pointing to the user's web page. (Your agent doesn't have to be 'live' all the time, but you should put the initial page somewhere where I can look at it.)
2. A writeup including your code, a description of how your agent works, strengths and weaknesses of your agent, and possible improvements. The goal of this document is for you to explain to me what you've done.
3. email me your Agent code. I will also want to see your agent 'in action'; we'll take some time on 4/2 for each person to demo their agent for me.
In parts 1 and 2, you built up the code needed for our book-shopping agent. Now, in part 3, you'll add some rudimentary personalization.
Update your display in the similarity area so that the title of each similar book and the image of its cover are hyperlinks. When the user clicks on one of these links, your agent should treat this as if it was another search; detailed information about the book clicked on should appear in the presentation area, and similar books to this book should appear in the similarity area. Note that this will involve two new queries: an ASIN query, done over HTTP and using XSLT to format the output for the presentation area, just as your first query did, and a SOAP query that results results for the similarity area, just as you did previously.
You may have noticed that some of the books in the "similarity" list are not really very similar. In the remainder of this part of the project, we'll add some personalization into our agent to help deal with this.
We'll begin simply; allow your user to indicate a list of genres that he or she likes, and then filter out books in the similarity list that are not in those genres.
Amazon uses BrowseNodes to refer to genres; you can use the ones described in Appendix A of the Amazon Web Services API (they're also included at the bottom of this page). On your web page, give your users an opportunity to specify genres that they like. This can be done via a radio button or checkbox within the title/author search, or as a separate 'preferences' page that the user brings up. This is one of those places where there are several possible implementations; part of the assignment is for you to choose a design that balances functionality against usability. For example, you might want to only offer a few of the more common genres, or maybe you prefer to offer all of them.
You agent will need to store a list of preferred categories, which will be used to filter results. The simplest way to to this is within the servlet. Once the user has specified the genres he or she likes, when your agent is generating a list of similar books, it should drop any books from genres that aren't on this list.
In this last part, we'll build up a probabilistic model of genres that a user likes, and then use that to generate a list of similar books. We'll use a version of a machine learning algorithm called reinforcement learning to do this. Reinforcement learning associates estimates of value with actions (or genres, in our case).
Our agent's challenge is to show books in genres that match a user's tastes. However, if the agent doesn't know those tastes, it has to learn them. Learning typically involves displaying books in unknown genres and seeing if the user likes this book. However, there's a catch in here: we want to both show the user books we already know he or she likes, but also find out their opinion on new genres. This is known as the exploration-exploitation problem: how much should we explore (suggest new categories) vs exploit (suggest categories known to be good.)
The basic reinforcement learning algorithm is very simple. Start with a list of genres and their associated values. Let's assume that there are n genres. Initially, we have no information about which genres are best, so we give each genre a value of 1/n. (it's helpful to normalize the values so they always sum to 1.) You can think of this as the relative probability of choosing that genre to display.
Whenever a user requests a book of a certain genre, that provides some evidence that they like that genre, so we increase the value associated with that genre. We can specify an increase amount delta, along with a learning rate alpha, between 0 and 1. The learning rate tells us how much weight to give to this new piece of information. the reinforcement learning rule looks like this:
new_value = (alpha) * (value + delta) + (1 - alpha)(value)
After the new value is computed for a genre, you should renormalize all the values so that they sum to 1.
A good value for delta is probably around 0.1; you'll want to experiment with this. Alpha (the learning rate) should depend on how much data you've seen: Early on you want to do lots of learning, but as you see more data, each point should have a smaller impact. (1/# of data points) is a resonable alpha.
So, now you can use your list of values as probabilities of including a book of a particular genre. When your similarity list comes back from Amazon, your agent will decide how many similar books to include (somewhere between 5 and 10 is a good number). It will then generate a random number between 0 and 1, find the corresponding genre (the values correspond to parts of the region between 0 and 1), and then select a book from that genre. If there are no similar books of that genre, try again. In this way, your agent tends to select books that match a user's previously displayed preferences, but occasionally it throws in something out of the ordinary, just to get some new information.
Just to summarize: your agent will have a table (the Hashtable and Map classes are useful for this) that maps BrowseNodes to probabilities. These probabilities sould sum to one.
When selecting a list of similar books, draw n random numbers, where n is the number of books to be displayed. For each book to be displayed, iterate through your table of genres, accumulating probabilities, until the accumulated probability is greater than the random number. Then select a similar book from that category. If there is no similar book from that category, draw again.
You may find that Amazon's SimilaritySearch is not giving you a wide enough selection of books. Perhaps you only get a small number of books back, or perhaps they're all in one category. If that happens, you should expand your search.
To do this, notice that each of the Similar Books that Amazon returns has a list of ASINs with it, in the Details/SimilarProducts/Product Nodes. These are books that are similar to the similar book, one degree of separation from the book being displayed in the presentation area. Doing a second Similarity Search on these books will get more (and hopefully more diverse) data for you.
Other than display, the other thing your agent will need to handle is the update of probabilities. When a user clicks on a similar book, you need to update your table, using the rule above. Once that's done, you need to renormalize the probabilities; sum up all the new probabilities in the table and divide each probability by that sum.
Books, Top Selling 1000
Books, Bargain 45
Books, Audiocassettes 44
Books, Audio CDs 69724
Books, Business 3
Books, Cooking 6
Books, Home/Garden 48
Books, Literature/Fiction 17
Books, Nonfiction 53
Books, Technical 173507
Books, Romance 23
Books, Sports 26
Books, Childrens 4
Books, Engineering 13643
Books, Health 10
Books, Reference 21
Books, Science 75
Books, Biographies 2
Books, Computers/Internet 5
Books, Entertainment 86
Books, History 9
Books, Law 10777
Books, Mystery 18
Books, Religion 22
Books, SciFi/Fantasy 25
Books, Travel 27
Books, Arts & Photography 1
Books, e-books 551440
Books, Women's Fiction 54265