Project 1: Atmospheric Data Analysis (v 1.1)
Starter repository on GitHub: https://classroom.github.com/a/LSVOAoYP
This assignment gives you the opportunity to put your C skills to use. We will be analyzing data from the National Oceanic and Atmospheric Administration (NOAA) North American Mesoscale Forecast System to learn more about the climate in a few different states.
C is a great match for data analysis, at least in the speed department: when you’re processing millions of lines of data, you’ll be able to get things done much faster.
In this programming assignment, you will get more familiar with:
- File I/O
- String manipulation routines
- Reading basic tab-delimited value (TDV) files
- Dynamic memory allocation
Here’s a sample run, passing in two test files:
./climate data_tn.tdv data_wa.tdv Opening file: data_tn.tdv Opening file: data_wa.tdv States found: TN WA -- State: TN -- Number of Records: 17097 Average Humidity: 49.4% Average Temperature: 58.3F Max Temperature: 110.4F on Mon Aug 3 11:00:00 2015 Min Temperature: -11.1F on Fri Feb 20 04:00:00 2015 Lightning Strikes: 781 Records with Snow Cover: 107 Average Cloud Cover: 53.0% -- State: WA -- Number of Records: 48357 Average Humidity: 61.3% Average Temperature: 52.9F Max Temperature: 125.7F on Sun Jun 28 17:00:00 2015 Min Temperature: -18.7F on Wed Dec 30 04:00:00 2015 Lightning Strikes: 1190 Records with Snow Cover: 1383 Average Cloud Cover: 54.5%
Testing Your Code
There are three data files included to test your code:
data_multi is compressed to save space. To decompress it, use your favorite archive utility or the command line:
Each file contains one record per line with fields separated by tab characters (
\t). The columns are organized as follows:
TN 1424325600000 dn20t1kz0xrz 67.0 0.0 0.0 0.0 101872.0 262.5665 TN 1422770400000 dn2dcstxsf5b 23.0 0.0 100.0 0.0 100576.0 277.8087 TN 1422792000000 dn2sdp6pbb5b 96.0 0.0 100.0 0.0 100117.0 278.49207 TN 1422748800000 dn2fjteh8e80 6.0 0.0 100.0 0.0 100661.0 278.28485 TN 1423396800000 dn2k0y7ffcup 14.0 0.0 100.0 0.0 100176.0 282.02142 ...
- State code (e.g., CA, TX, etc)
- Timestamp (time of observation as a UNIX timestamp)
- Geolocation (geohash string)
- Humidity (0 - 100%)
- Snow (1 = snow present, 0 = no snow)
- Cloud cover (0 - 100%)
- Lightning strikes (1 = lightning strike, 0 = no lightning)
- Pressure (Pa)
- Surface temperature (Kelvin)
We will also test your programs with other input files. Note: you can assume that each line in the files will contain all the fields. No need to check for malformed files or lines.
Hints and Resources
The dataset contains temperatures in Kelvin rather than degrees Fahrenheit. To convert K to F, you can use the following formula:
deg_f = deg_k * 1.8 - 459.67
The times the measurements were taken are expressed as Unix timestamps. These can be convered to string form with the
ctime function. You will also need to divide the timestamps in the data files by 1000 to adjust for the precision
#include <time.h> timestamp = timestamp / 1000; printf("Time: %s", ctime(×tamp));
Finally, be careful when determining which C data types to use in your struct. If you’re wondering what can be stored in different data types, check Wikipedia’s page on C Data Types.
The grade breakdown for this assignment is:
- 12pts Correct climate statistics
- 5pts Error handling (missing files, using perror, etc). Note: you can assume the data files we provide do not have any malformed data or missing fields.
- 5pts Support for processing multiple files
- 3pts Function documentation and comments
- 2pts Code style (no commented out blocks of code, unused variables, inconsistent indentation)
- 2pts Correct formatting and unit conversions
- 1pts Program usage message
- Initial version posted (2/6)
- Added hints and dataset info, project released (2/12)