Project 1: Atmospheric Data Analysis (v 1.1)
Starter repository on GitHub: https://classroom.github.com/a/LSVOAoYP
This assignment gives you the opportunity to put your C skills to use. We will be analyzing data from the National Oceanic and Atmospheric Administration (NOAA) North American Mesoscale Forecast System to learn more about the climate in a few different states.
C is a great match for data analysis, at least in the speed department: when you’re processing millions of lines of data, you’ll be able to get things done much faster.
In this programming assignment, you will get more familiar with:
- File I/O
- String manipulation routines
- Reading basic tab-delimited value (TDV) files
- C
structs
- Dynamic memory allocation
- Pointers!
Demo
Here’s a sample run, passing in two test files:
./climate data_tn.tdv data_wa.tdv
Opening file: data_tn.tdv
Opening file: data_wa.tdv
States found: TN WA
-- State: TN --
Number of Records: 17097
Average Humidity: 49.4%
Average Temperature: 58.3F
Max Temperature: 110.4F on Mon Aug 3 11:00:00 2015
Min Temperature: -11.1F on Fri Feb 20 04:00:00 2015
Lightning Strikes: 781
Records with Snow Cover: 107
Average Cloud Cover: 53.0%
-- State: WA --
Number of Records: 48357
Average Humidity: 61.3%
Average Temperature: 52.9F
Max Temperature: 125.7F on Sun Jun 28 17:00:00 2015
Min Temperature: -18.7F on Wed Dec 30 04:00:00 2015
Lightning Strikes: 1190
Records with Snow Cover: 1383
Average Cloud Cover: 54.5%
Testing Your Code
There are three data files included to test your code:
- data_tn.tdv
- data_wa.tdv
- data_multi.tdv.gz
data_multi is compressed to save space. To decompress it, use your favorite archive utility or the command line:
gunzip data_multi.gz
Each file contains one record per line with fields separated by tab characters (\t
). The columns are organized as follows:
TN 1424325600000 dn20t1kz0xrz 67.0 0.0 0.0 0.0 101872.0 262.5665
TN 1422770400000 dn2dcstxsf5b 23.0 0.0 100.0 0.0 100576.0 277.8087
TN 1422792000000 dn2sdp6pbb5b 96.0 0.0 100.0 0.0 100117.0 278.49207
TN 1422748800000 dn2fjteh8e80 6.0 0.0 100.0 0.0 100661.0 278.28485
TN 1423396800000 dn2k0y7ffcup 14.0 0.0 100.0 0.0 100176.0 282.02142
...
Fields:
- State code (e.g., CA, TX, etc)
- Timestamp (time of observation as a UNIX timestamp)
- Geolocation (geohash string)
- Humidity (0 - 100%)
- Snow (1 = snow present, 0 = no snow)
- Cloud cover (0 - 100%)
- Lightning strikes (1 = lightning strike, 0 = no lightning)
- Pressure (Pa)
- Surface temperature (Kelvin)
We will also test your programs with other input files. Note: you can assume that each line in the files will contain all the fields. No need to check for malformed files or lines.
Hints and Resources
The dataset contains temperatures in Kelvin rather than degrees Fahrenheit. To convert K to F, you can use the following formula:
deg_f = deg_k * 1.8 - 459.67
The times the measurements were taken are expressed as Unix timestamps. These can be convered to string form with the ctime
function. You will also need to divide the timestamps in the data files by 1000 to adjust for the precision ctime
expects:
#include <time.h>
timestamp = timestamp / 1000;
printf("Time: %s", ctime(×tamp));
Finally, be careful when determining which C data types to use in your struct. If you’re wondering what can be stored in different data types, check Wikipedia’s page on C Data Types.
Grading
The grade breakdown for this assignment is:
- 12pts Correct climate statistics
- 5pts Error handling (missing files, using perror, etc). Note: you can assume the data files we provide do not have any malformed data or missing fields.
- 5pts Support for processing multiple files
- 3pts Function documentation and comments
- 2pts Code style (no commented out blocks of code, unused variables, inconsistent indentation)
- 2pts Correct formatting and unit conversions
- 1pts Program usage message
Changelog
- Initial version posted (2/6)
- Added hints and dataset info, project released (2/12)