Design of a "Standard" Search API
Overview
Recently, a number of web services have appeared that allow digital libraries to
be queried with keyword searches, citation queries, and other associative
operations. The Google and Amazon APIs are two of the most popular.
Unfortunately, though they all provide similar functionality, each of the web
services define a service-specific programming interface. This makes the job of
building a meta-search engine, i.e., a system that queries multiple sources,
quite difficult, and it doesn't allow for a meta-search engine to provide a
dynamic list of information sources that can be queried using the same
operations.
As part of the Webtop project, we have defined an API whose goal is to provide a standard set of operations for a search API. This "Associative Source" API is used to wrap the service-specific APIs and thus provide a common programming interface that a meta-search engine (e.g., Webtop) can query.
But the current Associative Source API was created just as a prototype and without an in-depth analysis as to what such an API should contain. The goal of this project is to perform such an analysis in order to specify and create a better version of the API. But what would a search API provide? What operations? How should timeouts be handled?
Goals
1. Compare and contrast existing library-specific search APIs: Google,
Amazon, WebTop, Feedster, and others. Write a taxonomy of these APIs,
and detail the issues encountered.
2. Compare and contrast existing metasearch systems. What sources do they provide? Do they provide a dynamic list of sources, and if so, how is this facilitated. Here are a few to check: Metasearch, Firefox, A9
3. Formulate a specification for the ultimate search API.
4. Implement the API within Webtop by
a. Implementing SOAP and REST-ful web
services for Google, Amazon, Feedster that conform to the API.
b. Modifying the webtop client to make calls using the new API.
c. Creating sample code that others can download and modify
to create services on top of their own digital library.
Issues
There are two major types of web services: SOAP and REST-ful. Can an API be
defined in a more general way and then allow versions of both types?
Can advanced search and ranking parameters be part of the API? Flexibility and simplicity are competing interests here.
START and SDARTS, both from Stanford, were projects with a purpose of defining such a search API. These should be studied, and research should be done to see the current status of these and follow-up projects.
The Current WebTop API provides no mechanism for getting the next n results after getting some initial search results. The new API should handle this.
Some information sources provide results consisting of URLs. Others, such as Webtop personal webs, need to return actual documents.
Some systems, e.g. Amazon, are based on titles and site-specific ids. Others like Google are based on URLs.
Related Work
START’S
Stanford Proposal for Internet Meta-Searching
A Paepcke
SDLIP+ STARTS= SDARTS: A protocol and toolkit for metasearching
N Green, PG Ipeirotis, L Gravano -
View as HTML -
Cited by 8
Extending SDARTS: Extracting Metadata from Web Databases and Interfacing
with the Open Archives …
PG Ipeirotis, T Barry, L Gravano -
Cited by 2
The (digital) library world also has its own ideas for collection and search standards. See:
OAI-- The Open Archive's Initiative
Arc-An OAI Service Provider for Digital Library Federation
X Liu, K Maly, M Zubair, ML Nelson -
Cited by 31