Information Nuggets Project

Overview
One facet of the semantic web is the idea that users work and manipulate textual views of information entities, and don't have access to the rich information entities themselves. For instance, when a user gets an email with a persons name in it, he might cut and paste the name and the persons info into his contacts software. If the same name is found in another context, the system doesn't know that the name relates to this contact. When a user enters a comment in a blog about entity X, the entry is just text: X doesn't exist except perhaps as a url. In both cases, metadata and associations of the entity are hidden from the user.

Goals
Create a system that makes it easy to create and share information entities, associations, and comments. The system should allow:

* The following entities to be created:

    Person
    Institution
    Conference
    Journal
    Paper
    Book
    Film
    Others...

 the following associations:

    Person authorOf Paper
    Person firstAuthorOf Paper
    Person employeeOf Institution
    Person studentOf Institution
    Paper publishedIn Conference
    Paper publishedIn Journal
    Paper authoredBy Person
    Paper firstAuthoredBy Person
    Institution employs Person
    Institution enrolls student
    Others...

and comments, like blog entries, on entities.

* A dialog for creating all metadata entities and associations above. The dialog should allow the user to choose from existing entities when creating associations, and should help the user to not create dual entries for the same entity (e.g., by entering a name slightly differently).

* A dialog for creating comments on entities. Comments should also have types (e.g., summary, link, etc.) and the user should be able to see previous comments.

WebTop Integration
You may or may not decide to integrate your work with the WebTop system If you do, here are some issues:

* The dialogs should be integrated with one of the WebTop implementations, and the system should attempt to fill-out some of the metadata automatically. For instance, if the user is browsing a web page whose title is set in the html, the system should automatically grab that title.

* The data model should be integrated with Webtop's existing data model. In WebTop, everything is a subclass of Document. One subclass is HTMLDocument, one is WordDocument, etc. There is also a subclass called Resource, and preexisting classes for Person and Film. You need not use the existing Person, but definitely create classes Person and ResearchPaper. We may also want a "CreativeWork" class above ResearchPaper and Film.

* Webtop generates some xml data for the data model. The code that reads this XML and generates it will need to be updated for the new types of entities. Note that the existing code does not use emerging standard RDF and RDF schemes like dublincore of foaf. Modify it so that it does. 

Non-Webtop System
Implement the metadata and association dialogs as a firefox or IE plugin, allowing the user to create entities as she uses her favorite browser. Use RDF for persistence and standards such as dublincore.

Issues
One key issue is the format that should be used for persistence. RDF should almost assuredly be used, but there are a number of RDF schemas now appearing and vying for attention (dublin core, foaf). We can either define an RDF schema that doesn't use existing ones, or define one that is pieced together from various existing ones. In any case, existing schemas should be surveyed and explored and a reasonable decision made. This survey in itself is an important piece of research.

Where do entities and comments live? If they live on a user's system, how are they propagated to peers or central registries? How are they shared-- for instance, I'd like to use the entities and comments of all computer science professors in my UI? How do I specify this?

Related Work
See the links on Wolber's semantic web resources