Department of Computer Science

University of San Francisco

CS 112

Interpreter Project
Prelim. Structure Chart due Monday, Sept. 13
Program Part A Due: Sunday, Sept. 19, Midnight
Complete Program Due: Sunday, Sept. 26, Midnight

For programming assignment 2, you should write an interpreter for C-- (C minus-minus) language. An interpreter is a program that reads and executes the statements in another program.  The C-- language is a very primitive language, it consists only of assignment statements. Assignment statements have the form

<identifier> '=' <expression> ';'

Expressions in C-- are just identifiers and integer literals separated by plus signs. So

y44 = 33 + 45;

zzz = 34567+  555 + y44;

are legal assignments.

Interpreters
The process of interpreting a program is typically divided into three parts: lexical analysis, parsing, and execution. As you already know, a lexical analyzer breaks a program up into tokens -- the smallest units of the program that can have meaning. If your programming language were English, the lexical analyzer would identify the individual words and punctuation marks in a document. The parser determines whether the tokens have been arranged to form valid statements in the language. So if your programming language were English, the parser would check your sentences for subjects, verbs, objects, etc.

In the execution phase, the interpreter carries out the actions specified by the statements. Note that the three parts operate together: the parser asks the lexical analyzer for the next token and determines whether the new token can be legally added to the current statement. If it can and if the new token gives the interpreter enough information to carry out an action, it will. For example, in the course of interpreting the statement

xab = pqr + 7 + mn;

The interpreter knows after it has read the variable pqr that it can evaluate pqr -- i.e., retrieve its value from memory. After it has read the first plus sign and the 7, it can add 7 to the value it retrieved for pqr, etc.

You wrote a lexical analyzer for programming assignment 1. For programming assignment 2, you will use your lexical analyzer as a basis for writing a Java-- interpreter.

 

 

Errors
In addition to determining whether a statement is legal, the parser should print error messages when it encounters a statement it cannot parse. For example, the statement

xyz 2;

should result in an error message of the form

Parse error: expecting assignment operator

because after ``xyz'', the only legal input would be an assignment operator. The errors you should report are

After you encounter an error, your program can stop parsing and exit. You need not attempt to recover from the error and continue parsing.

Symbol Table
In order to keep track of the variables, you'll need to create a symbol table. A symbol table stores the names of variables together with the addresses they've been assigned in memory. When a new variable is assigned a value, you should add an entry to the end of the symbol table, and assign it the next available address. The ith variable referred to in the source code should be assigned the address i in memory. Note that all C-- programs are restricted to having less than 10 variables.

Main Memory
Main memory should be coded as class with a data member which is an array of ten integers.

Reading from a File
Your program should get its input from a file. The user should be allowed to input any file name at the beginning of the program. Your program should then read the entire contents of the file into the InputString text (i.e., you need to modify your readInput method of your lexer)

Program Execution
The interpreter should read the input from the file, then begin parsing and executing statements. For each valid assignment statement, your program will 1) add an entry in the symbol table, if the variable on the left hand side has not been referred to before, 2) evaluate the right-hand side of the assignment and store the result in a memory location associated with the left-hand side variable. If an invalid statement is encountered, your program should print an appropriate error message and exit. If all the code in the input file is syntactically correct, your program should print out the symbol table and main memory:

 

 

If the input file contains the following statements:

x = 5 + 3;
x = x+1
z = x+10+1;
y = z+x;
 
Your program should output:
 
The symbol table contains:
        id             address
        x               0
        z               1
        y               2 
 
Main Memory contains:
  cell      value
0       9
1       20
2       29   
3	0
4	0
...

Ignore End-Lines
Your program should treat end-line characters as whitespace and thus handle multiple assignments on a single line and a single assignment on multiple lines, as in:

x = 3; y = x+4; z =

x+1;

Program Design
You will need  three new classes, Symbol, SymbolTable, and MainMemory. SymbolTable can be designed as an array of Symbol. Each symbol should include a string representing the name of the variable and an int representing the address in memory that the variable represents.

Parsing should be implemented using a process called recursive descent parsing. With this scheme, you define a method for each syntactic element that you might be expecting. Generally, each of these methods is called parseX, where X is the syntactic element expected. So in your program, you'll define a parseStatement method that checks for a single complete programming statement, a parseId method, that checks to see if the first token is an id, a parseAssignmentOp method that checks for "=", and a parseExpression method that checks for a sequence of tokens which make up an expression. Each of these parse methods will call your getNextToken method one or more times.

Part A. In Part A, concentrate on checking a single statement for syntactic correctness -- i.e., parsing. In other words, have your first version simply check a statement read in from the keyboard for correctness and print a message stating if the statement is a legal one or not. In this first part, do not worry about reading from a file or about actually interpreting statements (no symbol table or main memory yet).

Part B. In Part B, you'll read input from a file, and you'll extend your parser so that it also interprets statements, i.e., builds a symbol table and main memory.

 

Grading

You will only be given credit for portions of the project that are fully completed.

TASK Points
Prelim Structure Chart/Updated Chart at End 5
Unit tests for all new methods 10
Documentation, formatting, quality 10
Working Program, part A 15
Complete Working Program, Parts A and B 60
EXTRA CREDIT: Generate psuedo-assembly code and interpret that code 10
EXTRA CREDIT: Handle floating point, multiple operators (-,*/), other statements 1-10

Documentation, formatting, and quality will also be graded on assignment 2:

  1. Documentation and source format will be 5% of your grade. Does your header documentation include the author's name, the purpose of the program, and a description of how to use the program? Are the identifiers meaningful? Are any obscure constructs clearly explained? Does the method header documentation explain the purpose of the method, its parameters, pre- and post-conditions, and any changes to object member variables? Is the indentation consistent? Have blank lines been used so that the program is easy to read? Did you follow the style guidelines provided?
  2. Quality of solution will be 5% of your grade. Are any of your methods more than15 lines? Are there long or multipurpose methods? Is your solution too clever -- i.e., has the solution been condensed to the point where it's incomprehensible?

Submission.
You must submit your source code to the cs112 submission directory by the due date. Bring a hard-copy of your program to your interactive grading session, which will occur at the lab following the due date.

Collaboration
It is OK for you to discuss solutions to this program with your classmates. However, no collaboration should ever involve looking at one of your classmate's source programs! It is usually extremely easy to determine that someone has copied a program, even when the individual doing the copying has changed identifiers and comments. If we discover that someone has copied a program, the authors of both programs will receive a 0 on the project and University sanctions.