Databases and Java

This lecture provides an introduction to databases and their use with the Java language.

 [Useful resource:
@(http://www.eveandersson.com/arsdigita/books/panda/databases-choosing,
Andersson DB book) and also
@(http://www.kitebird.com/articles/jdbc.html, Writing JDBC
Applications with MySQL)]

Question: what's wrong with files for storing data such as (download logs from jGuru):

<<
20020901_00.33.12 201927  j@infonomicon.com true
20020901_01.57.16 23349  m@web.de false
20020901_02.51.58 201927  a@attbi.com false
...
>>

Can use grep / awk to find stuff and can easily read from shell.  But, is hard to pull back into a program and can't do anything but simple queries.  For example, no way to do merged queries.

#### Database concepts

### Accessing db from cmd-line

Command line tool is SQL shell:
<<
mysql -p -h sql parrt
>>

Which tries to connect to the mysql database on host {sql} using
database called {parrt}.

### Tables

Tables are just rows of data.  Each row has multiple columns with different types.  For example, here is a table called {consumption}:

<<
CREATE TABLE consumption (
  programmer VARCHAR(100) NOT NULL,
  day ENUM('Mon','Tue','Wed','Thu','Fri') NOT NULL,
  cups INTEGER
);
>>

[
*Programmer* |	*Day* | *Cups*
----
Ito	| Mon	| 1
----
Chu	| Mon	| 2
----
Parr	| Tue	| 8
----

Long	| Tue	| 2
----
Chen	| Tue	| 3
----
Chen	| Wed	| 2
----
Wiryoma	| Thu	| 3
----
Tjun	| Thu	| 1
----
Witchel	| Fri	| 9
----
Zheng	| Fri	| 3
----
Yang	| Fri	| 4
]

Note that you can have duplicates but it's not a good idea here.

Do a show tables:

<<
mysql> show tables;
+----------------+
| Tables_in_demo |
+----------------+
| consumption    |
+----------------+
>>

and can ask for actual table back:

<<
mysql> show create table consumption;
...
consumption | CREATE TABLE `consumption` (
  `programmer` varchar(100) NOT NULL default '',
  `day` enum('Mon','Tue','Wed','Thu','Fri') NOT NULL default 'Mon',
  `cups` int(11) default NULL
) TYPE=MyISAM
...
>>

### Inserting and updating data

<<
INSERT INTO consumption (programmer,day,cups) VALUES ('Ito','Mon',1);
INSERT INTO consumption VALUES ('Chu','Mon',2);
INSERT INTO consumption VALUES ('Parr','Tue',8);
>>

### simple queries

<<
mysql> select * from consumption;
+------------+-----+------+
| programmer | day | cups |
+------------+-----+------+
| Ito        | Mon |    1 |
| Chu        | Mon |    2 |
| Parr       | Tue |    8 |
+------------+-----+------+
>>

<<
mysql> SELECT cups FROM consumption where programmer='Parr';
+------+
| cups |
+------+
|    8 |
+------+
>>

<<
mysql> SELECT SUM(cups) FROM consumption WHERE day='Mon';
+-----------+
| SUM(cups) |
+-----------+
|         3 |
+-----------+
>>

### Column data types

## Strings (TEXT type)

See http://www.mysql.com/doc/en/String_syntax.html but more importantly http://www.mysql.com/doc/en/String_types.html for type info.

You can use {TEXT} type for large strings.  VARCHAR is good for up to
255 in MySQL.  Note that this stuff is always database-dependent!
{CHAR} pads with spaces.  {TEXT} is case insensitive in MySQL.

## ENUM

Non-standard type but very useful.  (mysql doc: enumerated explicitly in the column specification at table creation time).  For example,

<<
create table resource_category (
    id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
    category VARCHAR NOT NULL,    
    PRIMARY KEY (id)
);
>>

then you must fill at database creation time.  Or with enum:

<<
create table resource_category (
    id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
    category ENUM('course', 'software', 'article') NOT NULL,
    PRIMARY KEY (id)
);
>>

## NULL

Used to store "nothing" in a column.  For example, if someone has no phone number, the column value for that row should be NULL.

<<
INSERT INTO Person (phone) VALUES (NULL);
>>

## DATETIME

http://www.mysql.com/doc/en/Date_and_time_types.html

Basically "YYYY-MM-DD hh:mm:ss"; e.g., "2002-09-29 00:00:00".  The
zero values you might want to use are:

<<
DATETIME '0000-00-00 00:00:00'
DATE	'0000-00-00'
>>

If you create a table like:

<<
create table users (
  created DATETIME not null,
  name VARCHAR(50)
);
>>

then you can insert values like this:

<<
insert into users values ('2005-10-26', 'Ter');
>>

### Relationships and foreign keys

Storing a row of data per key is easy.  How do you handle multiple
values though for each row?  For example, multiple phone numbers per
person or all the FAQ entries associated with a FAQ descriptor?  Have
to use another table and a _foreign key_.

First, look at storing multiple phone numbers per user:

<<
create table users (
	id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
	name VARCHAR(50),
	PRIMARY KEY (id)
);

create table phones (
	id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
	phone VARCHAR(20),
	userid SMALLINT UNSIGNED NOT NULL REFERENCES users(id),
	PRIMARY KEY (id)
);
>>

Graphically:

%image(users-foreign.png)

Each {phones} table entry points back to the user that "owns" this
record.  Notice that the user does not point at the phones it has;
rather, the phones point back at their "owners".

Then insert some data:

<<
insert into users (name) values ('Ter');
insert into users (name) values ('Tom');

insert into phones (phone,userid) values ('x5707',1);
insert into phones (phone,userid) values ('x6530',1);
insert into phones (phone,userid) values ('555-4321',2);
insert into phones (phone,userid) values ('555-1212',2);
insert into phones (phone,userid) values ('911',2);
>>

MySQL will show the following tables:

<<
mysql> select * from users;
+----+------+
| id | name |
+----+------+
|  1 | Ter  | 
|  2 | Tom  | 
+----+------+
2 rows in set (0.00 sec)

mysql> select * from phones;
+----+----------+--------+
| id | phone    | userid |
+----+----------+--------+
|  1 | x5707    |      1 | 
|  2 | x6530    |      1 | 
|  3 | 555-4321 |      2 | 
|  4 | 555-1212 |      2 | 
|  5 | 911      |      2 | 
+----+----------+--------+
5 rows in set (0.00 sec)
>>

So, how can you get the list of phones per user?  You might be tempted
to try the following query to get data from multiple tables:

<<
mysql> select name,phone from users,phones;
+------+----------+
| name | phone    |
+------+----------+
| Ter  | x5707    | 
| Tom  | x5707    | 
| Ter  | x6530    | 
| Tom  | x6530    | 
| Ter  | 555-4321 | 
| Tom  | 555-4321 | 
| Ter  | 555-1212 | 
| Tom  | 555-1212 | 
| Ter  | 911      | 
| Tom  | 911      | 
+------+----------+
>>

Ooops! Gives outer product, but want inner product.

<<
mysql> select name,phone from users inner join phones on users.id=phones.userid; 
+------+----------+
| name | phone    |
+------+----------+
| Ter  | x5707    | 
| Ter  | x6530    | 
| Tom  | 555-4321 | 
| Tom  | 555-1212 | 
| Tom  | 911      | 
+------+----------+
5 rows in set (0.00 sec)
>>

And then can grab phones for just one user:

<<
mysql> select name,phone from users inner join phones on users.id=phones.userid where name='Tom';
+------+----------+
| name | phone    |
+------+----------+
| Tom  | 555-4321 | 
| Tom  | 555-1212 | 
| Tom  | 911      | 
+------+----------+
3 rows in set (0.00 sec)
>>

Here's another variation where a Person has multiple roles and vacation days.

%image(foreign-key.gif,480)

As a 3rd example, consider building tables to manage FAQ
entries.  You want to define the FAQs (Dogs, Cats, ...) and then have
entries within those FAQs.

<<
CREATE TABLE FAQ (
    id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
    name CHAR(60) NOT NULL, -- e.g., Dogs, Birds, Cats
    PRIMARY KEY (id)
);

CREATE TABLE ENTRY (
    id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
    created DATETIME NOT NULL,
    question VARCHAR(255) NOT NULL,
    answer TEXT NOT NULL,
    faqid SMALLINT UNSIGNED NOT NULL REFERENCES FAQ(id),
    PRIMARY KEY (id)
);

INSERT INTO FAQ (name) VALUES ('Dogs');
INSERT INTO FAQ (name) VALUES ('Cats');
INSERT INTO ENTRY (created,question,answer,faqid)
  VALUES ('2002-09-30','What is a terrier?','A small dog',1);
INSERT INTO ENTRY (created,question,answer,faqid)
  VALUES ('2002-09-30','Why are some dogs smaller than cats?','a mystery!',1);
INSERT INTO ENTRY (created,question,answer,faqid)
  VALUES ('2002-09-30','Why are cats better than dogs?','they do not bark',2);
>>

Then you might do queries like:

<<
mysql> select * from FAQ;
+----+------+
| id | name |
+----+------+
|  1 | Dogs |
|  2 | Cats |
+----+------+
>>

<<
mysql> select * from ENTRY;
+----+---------------------+--------------------------------------+------------------+-------+
| id | created             | question                             | answer           | faqid |
+----+---------------------+--------------------------------------+------------------+-------+
|  1 | 2002-09-30 00:00:00 | What is a terrier?                   | A small dog      |     1 |
|  2 | 2002-09-30 00:00:00 | Why are some dogs smaller than cats? | a mystery!       |     1 |
|  3 | 2002-09-30 00:00:00 | Why are cats better than dogs?       | they do not bark |     2 |
+----+---------------------+--------------------------------------+------------------+-------+
>>

To {JOIN} the tables to find all "Dogs" FAQ entries do this:

<<
mysql> SELECT question from FAQ,ENTRY
    ->   WHERE ENTRY.faqid=FAQ.id AND
    ->         FAQ.name='Dogs';
+--------------------------------------+
| question                             |
+--------------------------------------+
| What is a terrier?                   |
| Why are some dogs smaller than cats? |
+--------------------------------------+
>>

Same as

<<
SELECT question from FAQ INNER JOIN ENTRY ON ENTRY.faqid=FAQ.id
  WHERE FAQ.name='Dogs';
>>

Add more data then do a full join.

<<
INSERT INTO ENTRY (created,question,answer,faqid)
  VALUES ('2002-09-30','What is a bulldog?','An ugly dog',1);
INSERT INTO ENTRY (created,question,answer,faqid)
  VALUES ('2002-09-30','Are cats Human?','No they just act like it',2);
>>

Just referring to comma-separated tables does a *JOIN* that duplicates data like a cross product in math:

<<
mysql> select * from FAQ,ENTRY;
+----+------+----+---------------------+--------------------------------------+--------------------------+-------+
| id | name | id | created             | question       | answer                   | faqid |
+----+------+----+---------------------+--------------------------------------+--------------------------+-------+
|  1 | Dogs |  1 | 2002-09-30 00:00:00 | What is a terrier?        | A small dog              |     1 |
|  2 | Cats |  1 | 2002-09-30 00:00:00 | What is a terrier?       | A small dog              |     1 |
|  1 | Dogs |  2 | 2002-09-30 00:00:00 | Why are some dogs smaller than cats? | a mystery!               |     1 |
|  2 | Cats |  2 | 2002-09-30 00:00:00 | Why are some dogs smaller than cats? | a mystery!               |     1 |
|  1 | Dogs |  3 | 2002-09-30 00:00:00 | Why are cats better than dogs?       | they do not bark         |     2 |
|  2 | Cats |  3 | 2002-09-30 00:00:00 | Why are cats better than dogs?       | they do not bark         |     2 |
|  1 | Dogs |  4 | 2002-09-30 00:00:00 | What is a bulldog?        | An ugly dog              |     1 |
|  2 | Cats |  4 | 2002-09-30 00:00:00 | What is a bulldog?       | An ugly dog              |     1 |
|  1 | Dogs |  5 | 2002-09-30 00:00:00 | Are cats Human?       | No they just act like it |     2 |
|  2 | Cats |  5 | 2002-09-30 00:00:00 | Are cats Human?       | No they just act like it |     2 |
+----+------+----+---------------------+--------------------------------------+--------------------------+-------+
10 rows in set (0.00 sec)
>>

Every row of FAQ is paired up with every row of ENTRY so there are n\*m rows in the join for |FAQ|=n and |ENTRY|=m.

*NATURAL JOIN* matches up id's and combines tables but it is not always what you want (matches up {id} column):

<<
mysql> select * from FAQ NATURAL JOIN ENTRY;
+----+------+---------------------+--------------------------------------+-------------+-------+
| id | name | created             | question  | answer      | faqid |
+----+------+---------------------+--------------------------------------+-------------+-------+
|  1 | Dogs | 2002-09-30 00:00:00 | What is a terrier?  | A small dog |     1 |
|  2 | Cats | 2002-09-30 00:00:00 | Why are some dogs smaller than cats? | a mystery!  |     1 |
+----+------+---------------------+--------------------------------------+-------------+-------+
2 rows in set (0.00 sec)
>>

What you want is:

<<
mysql> select FAQ.id,name,faqid,ENTRY.id,question,answer
  from FAQ INNER JOIN ENTRY on FAQ.id=ENTRY.faqid;
+----+------+-------+----+--------------------------------------+--------------------------+
| id | name | faqid | id | question                             | answer                   |
+----+------+-------+----+--------------------------------------+--------------------------+
|  1 | Dogs |     1 |  1 | What is a terrier?                   | A small dog              |
|  1 | Dogs |     1 |  2 | Why are some dogs smaller than cats? | a mystery!               |
|  2 | Cats |     2 |  3 | Why are cats better than dogs?       | they do not bark         |
|  1 | Dogs |     1 |  4 | What is a bulldog?                   | An ugly dog              |
|  2 | Cats |     2 |  5 | Are cats Human?                      | No they just act like it |
+----+------+-------+----+--------------------------------------+--------------------------+
>>

### Indices

Indices are used to prevent linear walk of data.  Makes find rows and sorting much faster.  Multi-column indices make find data with

<<
SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;
>>

much faster as it can grab the data from the index.  If two single-column indices, finds row(s) and fetches data.

Create an index:

<<
CREATE [UNIQUE|FULLTEXT] INDEX index_name
       ON tbl_name (col_name[(length)],... )
>>

such as:

<<
create index person_name_index on person (name);
>>

or a multi-column index.

<<
create index person_name_email_index on person (name,email);
>>

Can use portion of column to reduce mem usage w/o much loss of speed.

## MySQL issues

# Auto increment columns

Often you want to know the ID of the row you just entered.  MySQL
provides the AUTO\_INCREMENT modifier so you don't have to track your
own unique IDs (via db _sequences_ usually)--you just insert rows.
But, you need to know what that ID is to fill in your Java object in
memory usually.  Assume following table:

<<
// from mysql doc
CREATE TABLE person (
    id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
    name CHAR(60) NOT NULL,
    PRIMARY KEY (id)
);
>>

If you cast one of the JDBC objects to a specific mysql object, it
will tell you the id used to do the insert.  

You can also use another query.  See the mysql doc for
{SELECT LAST_INSERT_ID();}.  Basically, if you do

<<
SELECT LAST_INSERT_ID();
>>

it will return a result set with the ID such as:

<<
mysql> SELECT LAST_INSERT_ID();
+------------------+
| LAST_INSERT_ID() |
+------------------+
|                1 |
+------------------+
1 row in set (0.00 sec)
>>

# Storing large items

A {BLOB} is a binary large object.  You could use it to store email attachments or images etc...

@(http://www.jguru.com/faq/view.jsp?EID=742030,How can I insert images into a Mysql database?)

### Transactions and locking

Transaction: do operation then commit concept.  Very useful for multi-stage operations to ensure consistent data in database.  For example, update customer account and delete from inventory in one operation: must be atomic like with threads updating Java memory.  If any part of a transaction fails, then you can {rollback} changes.  If all is well, you can {commit}.  Transactions in MySQL start with {BEGIN;} and end with {COMMIT} or {ROLLBACK}.

If you {LOCK} a table, MySQL allows one write at a time (atomically).  All reads wait until all updates are done then reads.

From mysql doc:

<<
mysql> LOCK TABLES trans READ, customer WRITE;
mysql> SELECT SUM(value) FROM trans WHERE customer_id=some_id;
mysql> UPDATE customer SET total_value=sum_from_previous_statement
    ->        WHERE customer_id=some_id;
mysql> UNLOCK TABLES;
>>

Mysql doc: "Without LOCK TABLES, there is a chance that another thread might insert a new row in the trans table between execution of the SELECT and UPDATE statements."

An additional problem.  With some applications you need to support
extended locking.  For example, to update a person's user record, you
must read then wait a long time for them to hit submit and then must
update the database.  You do not want someone reading the same record
and updating it in between.  This is the classic race condition
associated with "test and set" operations.  You would need to lock tables
to ensure proper atomic update and do application locks so duplication
operations don't start up.

## "Cooperative" Application and object level locks

MySQL provides a nice mechanism for handling the 'edit person record' problem.  Use

<<
SELECT GET_LOCK("lock_faq_77",10); -- timeout=10seconds
...
SELECT RELEASE_LOCK("lock_faq_77");
>>

Note you can halt commencement of duplication operations by checking lock:

<<
SELECT IS_FREE_LOCK("lock_faq_77");
>>

Mysql doc seems to imply only one lock per client connection.  This stinks.  One way around it is to add a lock id and lock\_created column to each row.
Here is some good info on that:
http://www.bitmechanic.com/mail-archives/mysql/current/1000.html

To lock use these queries:

<<
SELECT lockid,whenlocked FROM yourtable WHERE ID=IDtoLock;
UPDATE yourtable SET LOCKID=aLockIDManagedByYourApp,
  WHENLOCKED='currentTimestamp'
  WHERE EID=IDtoLock;
>>

To unlock just set the lock id to {NULL}.  Your application must manage the lock ids and associate a lock id with a user in their session data.  They must be able to perform multiple operations.

#### Using databases from Java

### connecting

MySQL doc:
@(http://mmmysql.sourceforge.net/doc/mm.doc/c106.htm)

1 Load a JDBC driver for desired data source. A data source is any database you have access to through a JDBC driver. In the case of ODBC, the data source must be registered with the ODBC driver manager from the Control Panel. With any other driver you setup the access via vendor-specific procedures.
 

1 Connect to a data source. Once your driver is loaded, you can connect to the specific data source through a URL of the form jdbc:subprotocol:name. 


1 Execute SQL statements. You use SQL commands from Java to get results and then use Java to display or manipulate that data. 


1 Close the connection. When you are done, you close the connection to the database, freeing the resources.

Embodied by:

<<
import java.sql.*;

public class Test {
    // Go to http://www.mysql.com/doc/en/Java.html to get JDBC driver
    // Or directly to get the driver here:
    // http://www.mysql.com/downloads/api-jdbc-stable.html
    // or just see /home/public/cs601 for the jar:
    //   mysql-connector-java-3.0.9-stable-bin.jar
    public static void main(String[] args) throws SQLException {
        String username  = "parrt";
        String password  = "parrt";
        String db  = "parrt";
        try {
            // load driver
            Class.forName("com.mysql.jdbc.Driver").newInstance();
        }
        catch (Exception e) {
            System.err.println("Can't find driver");
            System.exit(1);
        }
        // format "jdbc:mysql://[hostname][:port]/[dbname]"
        String urlString = "jdbc:mysql://sql.cs.usfca.edu/"+db;
        Connection con = DriverManager.getConnection(urlString,
                                                     username,
                                                     password);

	// do your work here

        con.close();
    }
}
>>

I compile with:

<<
$ javac -classpath .:/home/public/cs601/mysql-connector-java-3.0.9-stable-bin.jar Test.java
>>

### executing SQL statements

First you must create a statement to use:

<<
Statement stmt = con.createStatement();
>>

## select statements

<<
ResultSet result = stmt.executeQuery (
  "SELECT programmer, cups " + 
  "FROM consumption ORDER BY cups DESC;");

System.out.println("Programmer\tCups");
while ( result.next() ) {
  String programmer = result.getString("programmer");
  int cups = result.getInt("cups");
  System.out.println(programmer+"\t"+cups);
}
stmt.close();
>>

I execute with:

<<
$ java -cp .:/home/public/cs601/mysql-connector-java-3.0.9-stable-bin.jar Test
Programmer      Cups
Parr    8
Chu     2
Ito     1
>>

## insert and update statements

<<
Statement istmt = con.createStatement();
int n = istmt.executeUpdate(
          "INSERT INTO consumption VALUES ('Acedo','Fri',8)");
System.out.println("Inserted "+n+" records");
>>

## prepared statements

Normally used for efficiency, but MySQL doesn't pre-compile.  Nonetheless there are some useful features.  can use like a template, don't have to build string and also used to store big values like images.  Don't have to escape special char.

<<
PreparedStatement prep = con.prepareStatement(
    "INSERT into consumption (programmer,cups) values (?, ?)");
prep.setString(1, "Jim");
prep.setInt(2, 8);
if (prep.executeUpdate () != 1) {
    throw new Exception ("Bad Update");
}
>>

### Database connection pools

Fast way to talk to the database since connection is very expensive.

http://www.jguru.com/faq/view.jsp?EID=17020

#### Java Object to DB Mapping

Serialization?

Strings for properties example.

### Standard tables

concept: Entities with attributes

Entities mapped to standard tables that "inherit" from:

<<
CREATE TABLE RESOURCE (
        EID INTEGER NOT NULL,
        LOCKID INTEGER,
        WHENLOCKED TIMESTAMP,
        CREATED TIMESTAMP,
        MODIFIED TIMESTAMP,
        AUTHOR INTEGER,
...
);
>>

[
*EID* | *LOCKID* | *WHENLOCKED* | *CREATED* | *MODIFIED* | *AUTHOR*
----
... | ... | ... | ... | ... | ...
]

## Single-valued attributes

Just fields of table

<<
CREATE TABLE FAQ (     
        EID INTEGER NOT NULL,
        LOCKID INTEGER,
        WHENLOCKED TIMESTAMP,
        CREATED TIMESTAMP,
        MODIFIED TIMESTAMP,
        AUTHOR INTEGER,
        QUESTION LONG VARCHAR,
        ANSWER LONG VARCHAR,     
...
);
>>

[
*EID* | ... | *AUTHOR* | *QUESTION* | *ANSWER*
----
_n_ | ... | _a_ | What is JDBC? | ...
]


## Multi-valued attributes

Each attribute maps to separate table mapping EID<->attribute(s).

<<
CREATE TABLE FAQ_TOPIC (
        EID INTEGER NOT NULL,
        TOPIC INTEGER NOT NULL,
        PRIMARY KEY (EID,TOPIC)
);
>>

[
*EID* | *TOPIC*
----
... | ...
]

## Indices

Index for every column of entity tables.  Dual column index for each multi-valued attribute table.

### Database manager

Single thing to change, portability, simplicity, replace with proxy/stub, logging, does the OO <-> DB mapping.

Can do standard queries.

### Taking Advantage of constraints

## Caching of data

Save answer to each query for a while?

Cache whole database in memory and then reload occasionally?

## Caching of pages

Don't recompute pages that don't change much.  Need to refresh intelligently, however.