Assignment 1: Python and Domain Warmup

Due Date: Feb 11th

This assignment is meant to get you up to speed with Python and also to provide you with experience in the three primary domains we'll be working with this semester. For all assignments, you must use the provided template code as a starting point.

  1. (30%) Word frequencies. One of the primary domains we'll work with this semester is text. The most common approach to dealing with large bodies of text is statistical, which requires counting the number of words in a document.
    Write a Python function that can take a string as input and create a dictionary that maps each word in the string to the number of times it occurs in the string. The function should provide the user with the option of converting all words to lower case and also stripping off punctuation. You should also provide a "main" method that takes command-line arguments indicating whether stripping or conversion is desired, as well as the name of a file to use as input. It should also provide the user with the ability to pickle the dictionary to a file for later use. Skeleton file to start with: wordfreq.py
  2. (40%) Graphs. Graphs are one of the most commonly used data structures in computer science. In this assignment, we'll represent a graph as an Adjacency list. This will be implemented as a dictionary that maps a vertex to a list, which contains each of its neighbors, along with associated data such as edge weight.
  3. (30%) ARFF data. Attribute-Relation File Format (ARFF) is a common format for storing data for use in supervised learning problems. It consists of three sections:
    1. The relation being learned about
    2. The attributes and their possible values
    3. The data itself
    For this assignment, you should do the following: