CS 662: AI Programming

Project #2: Designing an Ontology

Assigned: October 25

Due: November 15

Total points: 100

Introduction

In this project, we'll assume that, based on your expertise and skill in AI, you have been hired by Spamazon, an online retailer of books and media, to help develop an ontology. They would like you to design and implement a medium-sized ontology using Protege and the OWL plugin. In designing your ontology, you will be expected to follow the design principles discussed in class and outlined in the paper "Ontology Development 101", which is linked below. In grading the project, I will be particularly interested in the design decisions you make, and your reasons for making them.

To turn in

Place a copy of your .owl file generated by Protege in your submit directory. Most significantly, you should also prepare a design document describing your ontology and how it was built. (More on this below.)

Resources

You will find the following documents very helpful:

Software

You will do this project within the Protege ontology development tool. Protege is open-source, and can be downloaded from the Protege homepage. It is installed locally under /home/public/cs662.

Protege-OWL uses the RACER reasoning engine as a backend. RACER uses an HTTP interface to communicate with Protege. There is a version of RACER running at: http://boromir.cs.usfca.edu:8080.

A version of RACER for Linux can also be found in /home/public/cs662/RacerPro-1-8-1, if you prefer to run Racer on localhost.

You can download a trial version of Racer here, if you would prefer to run it at home. Note that you need to request a license and get approved, so it may not be instantaneous (don't wait until 3am the night before to do this!).

Grading

The project will be graded as follows:

Design document

This is primarily a design project, rather than a programming project. You will most likely write little or no code; most of your 'programming' will be within Protege. The point of it is to give you experience designing and implementing a medium-scale ontology using a full-featured ontology development tool.

As a result, I would like you to prepare a document that discusses your design choices. I'll talk about what the document should contain throughout the project, but, in short, it should contain:

There is not a minimum or maximum page limit on this document; however, I think it would be difficult to present this information thoroughly in less than five pages. It should be typed or prepared using a word processing program. You are welcome (and encouraged) to use pictures or diagrams to help explain your ideas. I would also encourage you to be precise in your language whenever possible.

Note: This document is half the grade for this project. As a result, I would strongly suggest that you do not wait until the last minute to write it. In fact, I would suggest writing it while you are doing the design, rather than after.

If you are concerned about your ability to express yourself clearly in English, I would even more strongly suggest starting early. I do not expect perfect grammar, but I do expect to be able to understand what you are saying. If you are concerned about your writing skills (whether or not you are a native English speaker) I would suggest that you take advantage of the services provided by the USF Learning and Writing Center . If you're interested, they will look at your work and help you express yourself more clearly. This sort of grammatical or stylistic assistance is permitted; however, you may not have someone else write this document for you or assist you with the technical content. (In other words, having your roommate proofread it is fine. Having your roommate write it for you is not.)

(Note: a secondary goal of this project is to give you experience explaining technical ideas in written English. This is an essential skill in almost any job - it doesn't matter how good a programmer you are if you can't explain what you've done.)

The project itself

For the project, we will pretend that the class has been hired by an online bookseller called Spamazon to develop an ontology that describes their products. They would like to be able to suggest items to users, based on their expressed like or dislike of other items.

Spamazon has four basic categories of items:

You must pick a subcategory or genre within one of these four categories. (for example, Mystery Books, or Horror Movies, or Xbox games, or Techno Music, etc). Each person must pick a unique category - when you have chosen a category, you must send me email to get it approved. If someone else has already chosen this category, or your category is not suitable for some other reason, I will suggest some alternatives.

You should choose a domain that is rich enough to make your ontology interesting, but not so large that it will take years to complete. As a rough baseline, your ontology should have at least 25 classes and 75 instances. I would suggest choosing a domain that you have some knowledge about or interest in.

Specifying your domain and usage

To begin, you will want to specify your ontology's domain precisely. For example, who will be using your ontology? What will they use it for? What sorts of relationships will they want to know about? You should prepare a set of at least 10 competency questions that your ontology should be able to answer. These questions should provide a picture of the breadth of your ontology's scope. (In other words, don't have "Who wrote (book X)?" 10 times.)

Your design document should have a section enumerating your competency questions and discussing how they provide sufficient usage examples.

Class and property design

Your design document should also contain a description of the important classes and properties in your ontology. Please keep in mind that a listing of every single class one by one is not necessarily the the best way to present this information. You may find that pictures are better than words at describing the classes and the relationships between them. Jambalaya might be a very effective tool to help you depict class/slot relationships.

In this section, I would also like you to describe any significant design choices that you made in constructing your ontology. For example, why did you decide to use a subclass rather than a property? Why did you decide to use a class rather than a string for a property's value? I'm particularly interested in your thought process here.

This is an example of a poor explanation: "I made a class for undergraduate student and a class for graduate student because there are graduate students and undergraduate students in the domain." Notice that it doesn't say anything about what other modeling possibilities might exist, or what the designer's thought process was.

This is a better explanation: "In modeling graduate students and undergraduate students, I chose to treat these both as subclasses of Student. I also considered using a property within student called typeOfStudent, but decided that this was not appropriate, because graduate students and undergraduate students each have other traits, such as their rank and the number of credits needed for full time status, that should be kept distinct."

You don't need to do this for every single class - I just want to know about any interesting design problems that came up in your modeling.

Adding instances

You should then populate your ontology with instances. You will want enough different instances to test the different classes and slots you've created. As mentioned above, 75 is probably a good estimate of the number needed. You are welcome to use external sources, such as Amazon, Allmusic.com, or Gamespot, to collect information; be sure you give them appropriate credit in your design document.

You may create your instances by hand using Protege, or write a program using either the Java API or the Jython API to create instances. Here are examples of how to programmatically import instances using the Protege Java API and Jython. Here is some extra information dealing with the OWL plugin API specifically. Note: You are not required to write code to enter instances; if your Java skills are not very strong, you may find it a very frustrating experience. This information is provided for people who want to play with this aspect of Protege.

As you add instances, you will most likely find that there are some weaknesses or problems with your ontology. Include a section in your design document that describes any problems you found as you were creating instances and how you modified your ontology to address these problems. Be as specific as you can.

Consistency Checking

As you build your ontology, you will need to check it for consistency. You will also need to infer relationships that are entailed by the axioms you've entered. A significant advantage of an ontology is that this process can be automated using a reasoner.

You'll be using an external program called Racer to do this checking. Protege is able to talk to Racer via an HTTP interface. The address of this server can be configured under the Owl->Preferences menu. There are three obvious choices:

Your final ontology must be consistent.

Querying

You should now be able to encode your competency questions, either by using the QueryTab or by creating classes that answer the questions and using the reasoner. Is your ontology able to answer all of the competency questions you originally posed? Are there other problems you can foresee?

Summary

Finally, your document should summarize the capabilities of your ontology. Now that you've built it and tested it, what are its potential uses? What audiences would be interested in it? Most importantly, what are your ontology's strengths and weaknesses? Are there concepts or queries that it cannot answer? How would you improve it in version 2.0?