USAGE neurosys/solve

Building Your Own Models
------------------------
1.  Introduction.  The solver exploits the existence of multiple
    processes using a method called "data parallelism."  Essentially
    this means that the data (neurons and variables) are partitioned
    among the processes, and each process is responsible for solving
    equations modelling its assigned neurons.  The current version of
    the program assumes that the number of processes evenly divides
    the number of neurons.  If there are n neurons and p processes,
    it assigns the first n/p to process 0, the next n/p to process 1,
    etc.

    Communication among the processes is required when the solution to
    a neuron's equations requires data from neurons assigned to other
    processes.  We assume that this can only happen with synaptic
    currents.  So the principal source of communication, and hence
    parallel overhead, is communication of synaptic currents.  In
    addition to this overhead, input data is read in by process 0 and
    distributed among the processes, and, if the output is
    undistributed, solutions to the equations are gathered onto process
    0, which is responsible for printing them.

    We have attemped to reduce the cost of communication by storing the
    synaptic currents in an array separate from the array that stores
    all the other variables.  So the C code for your equations should
    involve two arrays: one for synaptic currents and one for all other
    variables.

    We assume that each model has exactly one variable representing
    membrane voltage and exactly one variable representing synaptic
    current.  There can be any number of other variables.  In the
    sample code, input values to the equations are stored in two
    arrays:  y and syn_input.  The syn_input array contains
    synaptic currents, and the y array contains membrane voltages
    and other, non-synaptic, variables.  The output values are 
    also stored in two arrays:  answer and syn_answer; syn_answer
    stores the computed synaptic currents, and answer stores the
    other computed values.

    In order to reduce storage, whenever possible arrays are
    allocated to store only local data.  In the sample code 
    only syn_input is global:  it stores the synaptic currents
    for *all* of the neurons, both local and nonlocal.  The other
    arrays -- y, answer, and syn_answer -- only store information
    on variables assigned to the process.  This distinction
    introduces some confusion into the subscripting process.  So 
    be careful to use *local* subscripts with all the arrays
    except syn_input.

    A short example will clarify the difference:  suppose that there
    are 4 neurons, 2 processes, and each neuron is modelled by 3
    equations in addition to its synaptic current.  Further suppose
    that the membrane voltage is the first variable.  On the first
    process (process 0), "local" and "global" subscripts are the same.
    So to access the membrane voltage for the second neuron, you would
    use y[3] or answer[3], since elements 0, 1, and 2, are the
    variables associated with the first neuron.  To access the synaptic
    current associated with the second neuron, you would use
    syn_input[1] or syn_answer[1].  On the second process (process
    1), global and local subscripts are different.  In order to access
    the membrane voltage associated with the fourth (global) neuron,
    you would also use y[3] or answer[3], since y and answer are
    local.  However, in order to access the synaptic current associated
    with the fourth neuron, you would use syn_input[3] and
    syn_answer[1], since syn_input is global and syn_answer is local.

    There are several global (in the C sense) variables that can help
    with subscripting:

        my_first_neuron   my_first_var
        my_last_neuron    my_last_var

    These are subscripts for the first and last neurons or variables
    assigned to a process.  In order to avoid repeated recalculations,
    there are also counts of how many equations and how many neurons
    there are altogether, and how many are assigned to a process:

        total_equations   total_neurons
	num_my_equations  num_my_neurons 

    total_equations and num_my_equations represent numbers of
    variables, *not* including the synaptic currents.  These variables
    (and several others) are declared in the file globals.c.

    There are several macros defined in equations.c that may make 
    things easier:

        EQN_COUNT(global_neuron_rank):  specifies the number of
            equations (not including the synaptic variable)
            associated with the neuron with global neuron rank
            global_neuron_rank

        MODEL_FCN(global_neuron_rank):  if you have more than one
            model, it may be convenient to define an array of
            pointers to the functions that represent the models.
            See sample2 for an example.  This macro calls the
            function associated with the model for the neuron 
            with global neuron rank global_neuron_rank.

    The interconnections among the neurons are specified by an
    adjacency list.  The synaptic inputs to each neuron are a sequence
    of pairs, the first value in a pair is the global neuron rank and
    the second value is the sign (+1 or -1) of the synaptic
    connection.  Each process stores this information in an array of
    structs.  The name of the array is syn_connections.  It's declared
    in globals.c and the array subscripts are *local* neuron ranks.  
    Each element of the array corresponds to the synaptic inputs for a
    local neuron.  The struct has 3 members:

	dim: the number of synaptic inputs, 
        links:  the global neuron ranks of the neurons from which
	    synaptic input is received, and 
        signs:  the sign of the synaptic input received from the
	    neuron in the corresponding entry in links (the
            sign information may not be used in your models).

    The following macros are defined in equations.c.  They may be useful
    for accessing data in syn_connections.

        INPUT_NEURON_TYPE(local_neuron_rank, j):  this will determine
            the model type of the jth synaptic input to the neuron with 
            local neuron rank local_neuron_rank.  For example, suppose 
            the synaptic connections look like this (use a uniform width 
            font):

                          1
                    0 <--------- 1
                    ^            ^
                  1 |            | -1
                    |            |
                    3 <--------- 2
                          -1

            Also suppose that there are two model types, and neurons 0 
            and 3 have model type 0 and neurons 1 and 2 have model type
            1.  Then if there are two processes, on process 0

                INPUT_NEURON_TYPE(0,0) = 1, and
                INPUT_NEURON_TYPE(0,1) = 0.
            
        INPUT_NEURON_RANK(local_neuron_rank, j):  this will determine
            the global rank of the jth neuron sending synaptic input 
            to the neuron with local rank local_neuron_rank.  Using
            the same example, on process 0,

                INPUT_NEURON_RANK(0,0) = 1, and
                INPUT_NEURON_RANK(0,1) = 3.

        INPUT_NEURON_SIGN(local_neuron_rank, j):  if you are making
            use of the sign associated with synaptic currents, this
            will determine the sign of the jth neuron sending synaptic
            input to the neuron with local rank local_neuron_rank.
            Using the same example, on process 0,

                INPUT_NEURON_SIGN(0,0) = 1, and
                INPUT_NEURON_SIGN(0,1) = 1.

        SYNAPTIC_INPUT(local_neuron_rank, j):  if you call the global
            array of synaptic currents syn_input, then this macro
            will determine the synaptic current associated with the
            jth neuron sending synaptic input to the neuron with
            local rank local_neuron_rank.  Note this macro depends
            on the use of the array name syn_input.

2.  Creating the equations files.  It's easiest to model your source
    on one of the samples.

    a) first edit the file equations.c
       - Add identifiers for symbolic "constants" to the preamble. These
         are parameters modelling your equations that you may want
         to experiment with -- typically such things as sodium
         conductance, equilibrium voltage, etc.  Ints should be 
         listed before doubles.
       - Define various values needed by the solver:
         MODEL_COUNT -- the number of different neuron models
         EQUATION_COUNTS -- the number of equations used to model
             each type of neuron.  

                 N.B. Equations for synaptic currents should *not* 
                 be included in these counts.

             For example, if one of your neuron models uses V, n, 
             m, and h variables, and a synaptic current variable s, 
             then the value for EQUATION_COUNTS for this model
             should be 4 -- not 5.  This should be an array
             even if MODEL_COUNT is 1.
         VOLTAGE_EQN_OFFSETS -- for each model, if the variables
             are ordered, which variable represents membrane
             VOLTAGE?  This *should* be 0 for each model.  If
             it isn't, you'll need to edit the file output.c
             (instructions are included in that file).  This
             should also be an array.
      - Other variables/constants you'll need to write your
        equations.
      - Write void functions Setup_f() and Cleanup_f().  Setup_f()
        will be called by the solver to do any setup needed before
        starting the solve (e.g., allocate arrays needed by the 
        solver, initialize variables), and Cleanup_f() will be called 
        by the solver after completing the solution to do any cleanup
        (e.g., free arrays).
      - Write a Compute_f() function that will call the functions
        defining your neuron models.  This function and the Setup_f()
        and Cleanup_f() functions are the only function interfaces with 
        the rest of the program.  So if you change their arg lists or
        return values, you will need to change the calls in the
        solver.
      - Write the functions you need to define your model.  Keep in
        mind the fact that all of the variables associated with a
        neuron are stored in one array -- except the synaptic current.
        All of the synaptic currents are stored in another array.
        All of the variables in globals.c are accessible in equations.c
        Note that none of these variables should be modified in equations.c
        See the general notes above for further information.
        
    b)  Edit equations.h and (if necessary) the solver to reflect
        the changes you've made to equations.c.  In addition, in
	equations.h define the macros INT_COUNT, DOUBLE_COUNT, and
	ALL_GATHER.  INT_COUNT is the number of integer constants
        in the preamble to equations.c.  DOUBLE_COUNT is the number
        of double constants.   The default for ALL_GATHER is
        MPI_Allgather -- capitalization is important.  If you're
        not sure about this, use MPI_Allgather, and see the 
        discussion of performance below for further info.

    c)  Create a file called "constant_list" that contains the
        names of the constants you've specified in the preamble to
        equations.c.  This will be used by the program in 
        Create_read_constants to build the functions your program
        will use to read and distribute these constants among the
        processes.  Use the same order you used in equations.c.
        In particular, ints should be listed first. 


Compiling
---------
1.  Before you can start compiling, you will need the following
    three files:

        equations.c
        equations.h
        constant_list

    If this is the first time you're using Neurosys, it is probably
    a good idea to get these files from one of the sample directories,
    ./sample1 or ./sample2.  See the preceding discussion for information
    on building your own.

2.  Define the macros in Makefile

    a) CC:  you can usually omit this

    b) CFLAGS:  choose optimization and warning levels.  Additional
       flags are -DDEBUG and -DDIST_OUTPUT.  The DEBUG flag will
       generate *huge* amounts of output.  So use it with caution.
       If you do need it, it may be a good idea to define DEBUG just
       in the source file(s) you want to debug, rather than defining
       it for all the source.

       The DIST_OUTPUT flag will compile the system so that each
       each process writes its part of the output to a separate
       file.  Generally speaking this should be used only in
       production runs.  See the section below on output for
       further information.

    c) PROF, PROFLIB:  Define these macros if you want to study the
       performance of the solver.  The "totals only" definitions in the
       makefile will tell you the total amount of time each process
       spent doing local computations and communication routines.  The 
       other definitions are only appropriate if you have MPICH
       installed with the MPE options and either upshot or jumpshot.
       See the mpich documentation (http://www.mcs.anl.gov/mpi) for
       information on using these utilities.

    d) LDFLAGS:  any special flags you want passed to the linker

    e) INCLUDE:  directories in which to search for include files

    f) LIB:  paths for the MPI libraries and other libraries that need to
       be linked with the program.

2.  make


Running
-------
1.  There are three sources of input to the solver

    a) A file containing values for the "constants" in equations.c --
       typically one value per line.

    b) A file containing other input data, specifically
          starting time
          size of the time steps (in seconds).  The Runge-Kutta solver
              seems to work well with a stepsize of 0.01.
          total number of steps for the solver
          total number of equations -- including synaptic currents
          total number of neurons -- must be evenly divisible by the number 
              of processes
          number of rows of neurons (currently unused, but must be included)
          number of columns of neurons (currently unused, but must be included)
          frequency with which data should be printed.   If the step is
              a multiple of this, membrane voltages for that step will
              be printed.  For example, if this value is 10, then the program
              will print data for the zeroth step (initial conditions),
              the tenth step, the twentieth step, etc.
          name of the file in which output should be stored.  If the
              program has been compiled with the DIST_OUTPUT option, 
              each process will print to a file with this name followed
              by a period and the process rank -- e.g., if this value
              is "outfile", then the output will be stored in outfile.0, 
              outfile.1, etc.  If the DIST_OUTPUT option hasn't been
              used, then all of the output will go to a file with this
              name.
          List of the neuron types.  If you are using n neuron models,
              then the types are 0, 1, . . . , n-1.  This list specifies
              which type each neuron has.  For example, if there are two
              neuron types, four neurons, and this list contains the values
              0 1 1 0, then neurons 0 and 3 will have type 0, while
              neurons 1 and 2 will have type 1.
          Initial conditions for all of the variables -- including 
              synaptic currents.  The values for the first neuron should
              be first, then the values for the second neuron, etc.  Ordering 
              for values for a single neuron will coincide with the order 
              in which the values are stored in the arrays.   Membrane
              voltages should be first and synaptic currents last.
              For example, if the variables associated with a neuron are
              V (membrane voltage), h, n, and s (synaptic current), and
              there are two neurons, neuron 0 and neuron 1, then the
              initial conditions should be ordered V_0, h_0, n_0, s_0,
              V_1, h_1, n_1, s_1.
          A list of the number of synaptic inputs to each neuron.  For
              example, if there are four neurons, and this contains the
              values 2 1 0 1, then neuron 0 has two synaptic inputs, 
              neurons 1 and 3 one synaptic input each, and neuron 2 no 
              synaptic inputs.
          The adjacency list.  This should contain one line for each
              neuron.  The values on a line correspond to the synaptic
              inputs to the neuron corresponding to that line:  first
              line neuron 0, second line neuron 1, etc.  The data on
              a line should agree with the synaptic input counts listed
              above in that there should be *twice* as many values per
              line as synaptic inputs.  The values are paired -- the 
              first number of the pair is the neuron from which input is
              received, the second indicates whether the input is
              excitatory (+1) or inhibitory (-1).  The second value
              may be unused in your model, but it still needs to be
              included.  For example, suppose there are four neurons
              interconnected as follows (use a uniform width font)

                           1 
                     0 <------- 1
                     ^          ^
                   1 |          | -1
                     |          |
                     3 <------- 2
                          -1

              Then the four neurons have 2, 1, 0, and 1 synaptic inputs, 
              respectively, and this section reads

                  1 1 3 1
                  2 -1

                  2 -1

              That is, neuron 0 has two synaptic inputs, one from neuron 1,
              and one from neuron 3.  Both of its inputs are excitatory.
              Neuron 1 has one synaptic input:  an inhibitory input from
              neuron 2.  Neuron 2 has no synaptic inputs.  Neuron 3 has
              one inhibitory input from neuron 2.

    c) The final source of input simply tells the system where the 
       preceding inputs are coming from.  

    Notes:

    i) Although it's possible for input to be typed in from the
       keyboard, because of the amount of input, it's probably a good
       idea to put the constants (item a) and the main input data (item
       b) into files.

    ii) The program uses a rudimentary form of the C fscanf function,
       which will skip white space and comments:  a comment is any 
       information preceded by a "#" and terminated by a newline.
       The current version recognizes the %f, %lf, %d, %c, and %s
       format specifiers -- so it can read floats, doubles, ints,
       chars, and strings.

    iii) In the utilities directory ../util, there's a C program,
       inputgen.c, which can be used to automatically generate
       input files.

    iv) Note that the total number of neurons must be evenly divisible
       by the number of processes.

2. Output

   Currently the output from the solver can take one of two forms:
   undistributed or distributed output.  For undistributed output,
   the system stores in a single file the values of time and the 
   membrane voltages, all as floating-point numbers in ascii.  

   For distributed output, only membrane voltages are stored, and these
   are stored as ints in order to reduce file sizes.  If there are p
   processes, the values are stored in p files; the filenames are
   generated by appending the process rank to the filename specified by
   the user.  For example, if the user specfies the name "outfile" and
   there are four processes, the output will be stored in "outfile.0",
   "outfile.1", "outfile.2", and "outfile.3".

   With undistributed output, all of the processes "block" while 
   process 0 collects and writes the output.  So your runtimes may
   be substantially reduced if you use the distributed output option.
   On the other hand, testing and debugging are usually much easier
   with undistributed output.

   When you're using the distributed output format, your runtimes may
   be further reduced if you make sure that each process writes to a 
   local file.  For example, clusters of workstations often use NFS
   to mount a user's home directory on all of the machines.  So
   accessing files stored in your home directory may involve
   communication across the network, and hence be slower than 
   accessing unshared files -- files mounted only on the local
   system.  In clusters, directories such as /tmp and /var/tmp are
   typically local.  So it may be a good idea to direct distributed
   output to one of these directories -- /var/tmp may be better 
   since it's not cleaned out if there's a system crash.

   Output can be visualized with neurondiz or a standard plotting
   package.  Distributed output can be merged into a single file
   using the program ../util/merge.c.
   
        
Performance
-----------
1.  Profiling
    In order to help determine where time is being spent, there are four
    profiling options:  no profiling, totals only profiling, generic
    upshot/jumpshot profiling, and customized upshot/jumpshot profiling.
    Totals only profiling should work on any system.  It shows the time
    spent by each process in various parts of the solver.  The
    upshot/jumpshot profiling only works with the MPICH distribution of
    MPI.  Both versions of the upshot/jumpshot profiling should generate a
    log file that can be viewed with upshot or jumpshot.  However, these
    are untested.  For information on the use of upshot/jumpshot, see the
    documentation that comes with the mpich distribution.  In order to use
    one of the the profiling make the appropriate definitions of the PROF
    and PROFLIB macros in the Makefile.
    
2.  ALL_GATHER
    One of the most costly parts of the solver is the repeated calls to
    MPI_Allgather after each call to Compute_f.  These calls are made
    so that each process will have access to all of the synaptic
    currents.  The basic effect of these calls is that each process
    sends all of its neurons's synaptic currents to every other
    process.  However, in many cases, especially in very large
    networks, it isn't necessary for every process to have all the
    synaptic currents.  Recall that the solver assigns a block of
    consecutive neurons to each process.  So if, for example, there are
    4 processes and 20 neurons, then neurons 0, 1, 2, 3, and 4 are
    assigned to process 0, neurons 5, 6, 7, 8, and 9 are assigned to
    process 1, etc.  If there are no synaptic connections joining
    neurons process 0's neurons to (say) process 2's or process 3's,
    then the extensive communication provided by MPI_Allgather won't be
    necessary, since process 0 won't use the synaptic currents of the
    neurons assigned to process 2 or process 3.

    An important special case of this reduced communication occurs when
    there only synaptic connections among neurons assigned to
    consecutive processes.  For example, suppose there are 4 processes
    and there are only synaptic connections among neurons assigned to
    processes 0 and 1, processes 1 and 2, and processes 2 and 3.  So
    process 0 only needs the synaptic currents of its neurons and
    process 1's neurons; process 1 only needs its and those of process
    0's and process 1's neurons; process 2 only needs its and those of
    processes 1 and 3; and process 3 only needs its and those of 2.
    So in this situation process 0 only needs to send its currents to 1,
    1 needs to send its to 0 and 2, etc.  This can be accomplished with
    a simple pairwise exchange of currents:  0 and 1 exchange currents,
    1 and 2 exchange currents, and 2 and 3 exchange currents.  

    If the synaptic connections are only between consecutive processes,
    the macro ALL_GATHER in equations.h can be defined to be 
    "Pairwise_exchange" rather than MPI_Allgather.  For limitations
    on the functionality of Pairwise_exchange, see the documentation
    in comm.c

3.  Distributed output
    As noted in the Output section, above, performance may also be
    enhanced if each process is responsible for printing the membrane
    voltages assigned to it.  This can be accomplished by defining
    the DIST_OUTPUT in the Makefile.

4.  Solver
    Currently the only solver is a fourth order Runge-Kutta solver.  We
    hope in the future to make other solvers available.  After this has
    been done, different solvers can be chosen by defining the DE_SOLVE
    macro in solver.h.
