CS680: Web Systems and Algorithms

Fall 2011 (Last updated: August 5, 2011)
MWF 2:15pm - 3:20pm HR232 (big lecture hall on main floor)

Instructor: Terence Parr

Office hours: Any time HR531 door is open or by appointment
First day of class: Wednesday, August 24, 2011
Last day of class: Wednesday, December 7, 2011
Exam 1: October 19
Exam 2: December 7 (last day of class)

Abstract

This course will survey topics, systems and algorithms related to the World Wide Web and data mining. We will read articles and academic papers, learn lots of technology, and build a number of interesting projects. In order to be a web system architect and programmer, you must be familiar with managing servers and installing software. We will make extensive use of the Amazon Web Services throughout the course. You will learn how to present data dynamically in webpages via JavaScript technology, how to collect and crawl for data, how to store and retrieve vast amounts of data, and how to analyze that data for trends, similarity, and clusters.

Be forewarned that you will need to learn a lot of skills and technology on your own to complete the projects.

Requirements

You should be comfortable with:

CS662 or CS682 would provide extremely useful background. I will also assume that you either know or can teach yourself Python. I will assume you know Java. I will teach you JavaScript, or at least teach you to do cut-and-paste programming in JavaScript like everybody else when they use it for jquery and ajax (the language itself is... unpleasant).

Topics

Lecture notes

Web infrastructure

Collecting, storing, and representing data

Data analysis

Lectures

Labs

  1. jquery/ajax lab

Projects

  1. 5% Getting started with Amazon Web Services (Due Aug 29)
  2. 10% Proxy server (Due Fri Sept 9)
  3. 5% Building rich clients with jQuery (Due Mon Sept 26)
  4. 10% Twice-cooked Data
  5. 15% Search Engine Construction
  6. 20% Clustering and classifying

There are no late projects.

I will deduct 10% if your program is not executable exactly in the fashion mentioned in the project.

Instruction Format

Class periods of 1:05min each 3 times per week for 15 weeks. Instructor-student interaction during lecture is encouraged. "Pop quizzes" may appear during any class.

Grading

Your grade will be computed according to the following relationship:
5%Labs/Quizzes/Class participation
65%Projects
15%Exam 1 (October 19)
15%Exam 2 (December 7)

Please note that class participation is part of your grade. You must learn to interact with other developers and come up with solutions.

In general, I will read all papers, projects, quizzes etc... two times. Once to evaluate the average and a second time to assign scores. In the first pass, I also come up with a scoring strategy for each question.

I consider an "A" grade to be above and beyond what most students have achieved. A "B" grade is an average grade or what you could call "competence" in a business setting. A "C" grade means that you either did not or could not put forth the effort to achieve competence. An "F" grade implies you did very little work or had great difficulty with the class compared to other students.

I will be very strict and set a high standard in my grading, but I will work hard to help you if you are having trouble. Some of you may not get the grade you were hoping for in this class, but I will do everything I can to make sure you learn a lot and have a satisfying educational experience!

Unless you are sick or have a family emergency, I will not change deadlines for projects nor exam times. For example, I will not give you a special final exam just because you want to fly home early. Consult the university academic calendar before making travel plans.

Books and resources

Available free online via USF's subscription to Safari: I will also present content from Mining the web by Soumen Chakrabarti and Introduction to Information Retrieval by Manning, Raghavan, and Schültze.

No doubt that you'll find the following resource useful: Compiling, Executing, and Jar'ing Java Code.

We have academic licenses (so far) for:

CS680 Mailing List

I will be sending important information to this mailing list. You are required to sign up for this list. To sign up:

CS680 google group.

To post, email cs680@cs.usfca.edu.

Miscellaneous

Tardiness. Please be on time for class. It is a big distraction if you come in late.

Academic honesty. You must abide by the copyright laws of the United States and academic honesty policies of USF. If told you may for a particular project, use any code from the net that you find as long as it does not violate the software's license. You may not borrow code from other current or previous students. All suspicious activity will be investigated and, if warranted, passed to the Dean of Sciences for action.

Official text from USF: As a Jesuit institution committed to cura personalis- the care and education of the whole person- USF has an obligation to embody and foster the values of honesty and integrity. USF upholds the standards of honesty and integrity from all members of the academic community. All students are expected to know and adhere to the University’s Honor Code. You can find the full text of the code online at honor code.

The golden rule: You must never represent another person's work as your own.