CS 662: AI Programming
Assignment 4: Genetic Algorithms

Assigned: September 21
Due: October 5.
60 points total.

What to turn in: A professional-looking report containing graphs and answers for each of the questions indicated below, plus hard copies of the source code.

Also, please put a copy of your code in the submit directory for this class: /home/submit/cs662/(yourname)/assignment4. Everything necessary to run your code should be in this directory. If anything out of the ordinary is needed to run your code, please provide a README.

In this assignment, you'll be extending some code that I provide to apply genetic algorithms to two problems - a toy problem involving string patterns, and a larger problem involving scheduling of nurses to shifts in a hospital. This assignment is less coding-intensive than the previous one, but also requires you to perform some experiments and synthesize results. Half of your grade for this assignment (30 points) will be based on your code, and half (30 points) on the report you prepare.

You will also get some exposure to a different style of programming than you may be used to if your background is in Java. Much of this code is designed to be generic. That is, the genetic algorithm code is built to work with any problem representation without change. In addition, each problem can be configured to easily change the objective function without recoding. In order to impement this, I have taken advantage of the fact that functions are first-class objects in Python. Functions are passed as paramters, stored in lists, and operated on just like any other variable.

To begin:

To start, you'll want to run the existing code. There are three source files you'll need: Here's an example of how to run it:
 import bitstringFitness
 import ga
 b=bitstringFitness.bitstringProblem()
 g=ga.GA(bitstringFitness.bitstringFitness, 10, 100, 0.1, 10, 0.1, b)
 g.runGA()
So what's going on here? After importing the modules, we create an instance of a bitstringProblem. This is a class that's designed to hold any problem-specific parameters. (we don't have any yet) We then create an instance of a GA. This takes a number of arguments (almost all of which have defaults) including: When you ran this, you probably noticed that it didn't really do anything interesting. The reason for this is that we hadn't provided any information about the sorts of solutions we were looking for. We do this by adding constraints to the bitstringProblem class. For example:
 import bitstringFitness
 import ga
 b=bitstringFitness.bitstringProblem()
 b.addConstraint(bitstringFitness.allOnes)
 g=ga.GA(bitstringFitness.bitstringFitness, 10, 100, 0.1, 10, 0.1, b)
 g.runGA()
Each constraint function should return a number less than or equal to zero indicating how good a solution is. (Zero is all constraints met - large negative numbers indicate many constraints violated). For example, allOnes returns -1 * number of zeros in the chromosome. The fitness of a chromosome is the sum of applying each constraint to a chromosome.

AllOnes is a nice test function, but it's pretty boring. More interesting is the 'pattern' constraint. To use this, supply a pattern you'd like to discover to the bitstringProblem class, and then set the appropriate constraint. for example:
b=bitstringFitness.bitstringProblem('0110111001')
b.addConstraint(bitstringFitness.matchPattern)
g=ga.GA(bitstringFitness.bitstringFitness, 10, 100, 0.1, 10, 0.1, b)
g.runGA()
Report, pt 1 . For this portion, you will evaluate how long it takes the genetic algorithm to solve problems of different sizes. Run the GA on the matching problem for bitstrings of size 10, 10, 25, 50, 100, and 200. Use a population of size 50, elitism=0.1, and mutationRate=0.1. Prepare a graph with iterations on the x axis, fitness of the best solution on the y axis, and 1 line for each experiment. How does increasing the length of the string affect the time needed to find a solution?

Report, pt 2 As you have probably noticed, the GA has a number of parameters that must be set. A challenge with this sort of technique is to choose the best values for each of the parameters. In this portion of the assignment, you will test the effect of varying mutation rate on solution quality and convergence rate.

For the pattern matching problem with a string of length 50, run the GA with a population of size 50, elitism=0.1, and mutation rates of 0, 0.1, 0.25, 0.5, and 0.8. Prepare a graph with iterations on the x axis, fitness of the best solution on the y axis, and 1 line for each experiment. How does mutation rate affect the time needed to find a solution, and the quality of the solution found?

Report, pt 3 In this portion of the assignment, you will examine the effects of elitism. Currently, the GA always keeps the best solution from the previous iteration, plus a fraction of strong-performing solutions. We will vary this fraction.

For the pattern matching problem with a string of length 50, run the GA with a population of size 50, mutationRate of 0.1, and elitism of 0, 0.1, 0.25, 0.5, and 0.75. Prepare a graph with iterations on the x axis, fitness of the best solution on the y axis, and 1 line for each experiment. How does elitism affect the time needed to find a solution, and the quality of the solution found?

Coding, Pt 1 In lecture, we discussed two different methods for doing selection: tournament selection and roulette selection. This code uses tournament selection. Add a separate method to the GA class called chooseChromosomeR() that performs roulette selection. Prepare a graph that compares the performance of the GA using both roulette and tournament selection on the size 50 pattern using the best parameters you have found for mutation rate and elitism.


Using GAs for scheduling



In this portion of the assignment, you'll apply GAs to a somewhat more real-world problem, that of scheduling nurses to work shifts.

The problem can be stated as this: given a set of nurses n1 - nk, a set of shifts s1-sj, and a set of constraints c1-cm, find an assignment of nurses to shifts such that all constraints are satisfied. (Or, if this is impossible, such that the constraints are minimally violated.)

We can represent the nurse scheduling problem as a matrix, with nurses on the rows and shifts on the columns. A one in a cell in the matrix indicates that the nurse is scheduled for that shift, and a 0 indicates he/she is not. for example:
    s1    s2    s3
n1  0      1     1
n2  1      0     0
n3  0      1     0
    
We can encode this as the bitstring 011100010.

To begin, run the GA with the nursing problem on a simple example to get the feel of it. I've provided you with two sample constraints: oneNursePerShift, which says that each shift should have exactly one nurse working it, and oneShiftEach, which says that each nurse should work exactly one shift. (for there to be a solution, there must be the same number of nurses as shifts).
We can run this like so:
>>> import ga
>>> import nurseFitness
>>> n=nurseFitness.nurseProblem(5,5)
>>> n.addConstraint(nurseFitness.oneNursePerShift)
>>> n.addConstraint(nurseFitness.oneShiftEach)
>>> g=ga.GA(nurseFitness.nurseFitness, 20, 100, 0.1, 25, 0.1, n)
>>> g.runGA()
This will create a nurseProblem with 5 nurses and 5 shifts, attach the two constraints, and then run the GA with a population of 20 individuals for 100 iterations, with elitism and mutationRate of 0.1 and a string length of 25.

Coding, pt 2 Your job is to code some more interesting constraints for a larger problem. Here are the parameters:

We want to staff a floor on a hospital for a one-week schedule. The hospital has three 8-hour shifts per day, 7 days a week, for a total of 21 shifts. There are 10 nurses who work on the floor. You should write constraint functions for each of these constraints. (In other words, write four separate functions.) These functions should return (0-number of constraints violated). For example, You will want to test each of these constraints separately.

Report, pt 4: Once you have coded these constraints, run your GA with mutation rate of 0.1, a population of (at least) 100, and (at least) 1000 iterations. Vary elitism from 0, 0.1, 0.25, 0.5 and 0.75. Prepare a graph with iterations on the x axis, fitness of the best solution on the y axis, and 1 line for each experiment. How does elitism affect the time needed to find a solution, and the quality of the solution found? Is the effect different than for the matching problem? Is the GA always able to find a consistent schedule?

Repeat the previous experiment, but fix elitism at 0.1 and vary population size from 10, 25, 50, 100, 500, 1000. Prepare a graph with iterations on the x axis, fitness of the best solution on the y axis, and 1 line for each experiment. How does population size affect the time needed to find a solution, and the quality of the solution found?