Assignment 5 - Data Exploration through R - (50 points)



Due Date: 4/14/2016 at 11:59pm

Objectives:



  • Familiarize yourself with R
  • Experiment with various built-in visualization techniques
  • Explore a data set using visualization techniques

Download the baseball data



Download the baseball dataset (baseball.data). On the command prompt type

sh baseball.data

to get data.des.form which is a detailed description of the data and its variables, hitter.final contains a list of hitters and their stats, pitcher.final contains a list of pitchers and their stats, and team.final contains team information.

Assemble the data



Assemble the various spreadsheets of data and load it into R. Try uploading a small spreadsheet to R to learn how it works before you load the baseball dataset. Experiment with a couple of built-in visualizations and see which visualizations seem to work better than others. Think about the properties of the dataset when choosing a visualization.

Note: This data has missing values and particularly when considering the Salary field, replace the NA with a number like -1 to let R know that the Salary attribute is a number and not a string.

Visualization-directed inquiry



To answer the following questions, create a visualization, save as an image and upload it to your blog. Please make sure to use ggplot2 in your assignment. Feel free to explore the use of shiny in your assignment. Shiny provides the ability to incorporate interactivity into static R visualizations

  1. Who had the highest number of home runs (HR)?
  2. Who had the maximum number of hits in 1986?
  3. Name the second most expensive team in the league?

Specific goal



  • The goal is to use visualization methods to attempt to explain differences in the salaries of major league baseball players and to answer the question "Are players paid according to their performance?" on your blog.
  • Create as many different visualizations necessary to find answers to this question.
  • If you find any specific players that are not paid according to their performance, highlight those through a visualization and upload supporting visualizations to your blog.

Submitting the Assignment



Email me a link to your blog post as your submission. In addition to the visualizations, include a small README style description at the end of your post and anything you would like to tell me about the assignment.