Information Nuggets Project
Overview
One facet of the semantic web is the idea that users work and manipulate
textual views of information entities, and don't have access to the rich
information entities themselves. For instance, when a user gets an email with a
persons name in it, he might cut and paste the name and the persons info into
his contacts software. If the same name is found in another context, the system
doesn't know that the name relates to this contact. When a user enters a comment
in a blog about entity X, the entry is just text: X doesn't exist except perhaps
as a url. In both cases, metadata and associations of the entity are hidden from
the user.
Goals
Create a system that makes it easy to create and share information entities,
associations, and comments. The system should allow:
* The following entities to be created:
Person
Institution
Conference
Journal
Paper
Book
Film
Others...
the following associations:
Person authorOf Paper
Person firstAuthorOf Paper
Person employeeOf Institution
Person studentOf Institution
Paper publishedIn Conference
Paper publishedIn Journal
Paper authoredBy Person
Paper firstAuthoredBy Person
Institution employs Person
Institution enrolls student
Others...
and comments, like blog entries, on entities.
* A dialog for creating all metadata entities and associations above. The dialog should allow the user to choose from existing entities when creating associations, and should help the user to not create dual entries for the same entity (e.g., by entering a name slightly differently).
* A dialog for creating comments on entities. Comments should also have types (e.g., summary, link, etc.) and the user should be able to see previous comments.
WebTop Integration
You may or may not decide to integrate your work with the WebTop system If
you do, here are some issues:
* The dialogs should be integrated with one of the WebTop implementations, and the system should attempt to fill-out some of the metadata automatically. For instance, if the user is browsing a web page whose title is set in the html, the system should automatically grab that title.
* The data model should be integrated with Webtop's existing data model. In WebTop, everything is a subclass of Document. One subclass is HTMLDocument, one is WordDocument, etc. There is also a subclass called Resource, and preexisting classes for Person and Film. You need not use the existing Person, but definitely create classes Person and ResearchPaper. We may also want a "CreativeWork" class above ResearchPaper and Film.
* Webtop generates some xml data for the data model. The code that reads this XML and generates it will need to be updated for the new types of entities. Note that the existing code does not use emerging standard RDF and RDF schemes like dublincore of foaf. Modify it so that it does.
Non-Webtop System
Implement the metadata and association dialogs as a firefox or IE plugin,
allowing the user to create entities as she uses her favorite browser. Use RDF
for persistence and standards such as dublincore.
Issues
One key issue is the format that should be used for persistence. RDF should
almost assuredly be used, but there are a number of RDF schemas now appearing
and vying for attention (dublin core, foaf). We can either define an RDF schema
that doesn't use existing ones, or define one that is pieced together from
various existing ones. In any case, existing schemas should be surveyed and
explored and a reasonable decision made. This survey in itself is an important
piece of research.
Where do entities and comments live? If they live on a user's system, how are they propagated to peers or central registries? How are they shared-- for instance, I'd like to use the entities and comments of all computer science professors in my UI? How do I specify this?
Related Work
See the links on Wolber's semantic web
resources