Lab 2 - XML

Due - Thursday, February 7, 2008

The goal of this lab is to give you some practice using XML. For this lab, you will write a program that builds and XML database and enables a user to search for particular items. You will also write an XML Schema that specifies the format of the XML document you build.

Grading will be based on correctness (whether your program produces the correct output) as well as the design and documentation. Poorly designed programs (e.g., programs that consist of a giant main method) will be penalized.

Part 1

(45 points)

Implement a Java program that takes a directory such as this one as input, recursively traverses it, unzips the files it contains, parses the files, and generates an XML document of the following format:

<?xml version="1.0" encoding="UTF-8"?>
<BookDB>
  <Book EtextNum="2577">
    <Author First="Hippolyte A." Last="Taine"/>
    <Title>The Ancient Regime</Title>
    <Location Filename="/Users/srollins/teaching_local/cs682-s08/www.gutenberg.org/dirs/etext01/01ocf10.txt"/>
    <ReleaseDate>April, 2001</ReleaseDate>
  </Book>
  <Book EtextNum="2578">
    <Author First="Hippolyte A." Last="Taine"/>
    <Title>The French Revolution, Volume 1.</Title>
    <Location Filename="/Users/srollins/teaching_local/cs682-s08/www.gutenberg.org/dirs/etext01/02ocf10.txt"/>
    <ReleaseDate>April, 2001</ReleaseDate>
  </Book>
  <Book EtextNum="2579">
    <Author First="Hippolyte A." Last="Taine"/>
    <Title>The French Revolution, Volume 2</Title>
    <Location Filename="/Users/srollins/teaching_local/cs682-s08/www.gutenberg.org/dirs/etext01/03ocf10.txt"/>
    <ReleaseDate>April, 2001</ReleaseDate>
  </Book>
</BookDB>
If you have downloaded content directly from Project Gutenberg, you may ignore any non-txt files.

If you wish to use a language other than Java, please speak with me first.

Update 1/31/08: For full credit, your program must use the Document method createElement to create new Elements and add them to your XML structure. Programs that simply append strings will not receive full credit.

Part 2

(40 points)

Implement a program that will read your XML database from a file, build a DOM tree, and allow the user to search it. The user should be able to provide a keyword and specify whether to search by author or title. Your program will print the titles and authors of all Book elements that contain the keyword in the appropriate element. I recommend that you use XPath to help you with your search.

Part 3

(15 points)

Implement an XML Schema for your book database. The BookDB element will contain 0 or more Book elements. All subelements of Book are required and must be specified in order. You must be able to validate your databse using a schema validator such as this one.

Due 5:30PM - Thursday, February 7, 2008

  1. Submit all of your code, along with readmes, to your submit directory at /home/submit/cs682-s08/username.
Note: No portion of your code may be copied from any other source including another text book, a web page, or another student (current or former). You must provide citations for any sources you have used in designing and implementing your program.
Sami Rollins