|
Department of Computer Science |
University of San Francisco |
CS 112
Interpreter
Project
Prelim. Structure Chart due Monday, Sept. 13
Program Part A Due: Sunday, Sept. 19, Midnight
Complete Program Due: Sunday, Sept. 26, Midnight
For programming assignment 2, you should write an interpreter for C-- (C minus-minus) language. An interpreter is a program that reads and executes the statements in another program. The C-- language is a very primitive language, it consists only of assignment statements. Assignment statements have the form
<identifier> '=' <expression> ';'
Expressions in C-- are just identifiers and integer literals separated by plus signs. So
y44 = 33 + 45;
zzz = 34567+ 555 + y44;
are legal assignments.
Interpreters
The process of interpreting a program is typically divided into three parts:
lexical analysis, parsing, and execution. As you already know, a lexical
analyzer breaks a program up into tokens -- the smallest units of the program
that can have meaning. If your programming language were English, the lexical
analyzer would identify the individual words and punctuation marks in a
document. The parser determines whether the tokens have been arranged to form
valid statements in the language. So if your programming language were English,
the parser would check your sentences for subjects, verbs, objects, etc.
In the execution phase, the interpreter carries out the actions specified by the statements. Note that the three parts operate together: the parser asks the lexical analyzer for the next token and determines whether the new token can be legally added to the current statement. If it can and if the new token gives the interpreter enough information to carry out an action, it will. For example, in the course of interpreting the statement
xab = pqr + 7 + mn;
The interpreter knows after it has read the variable pqr that it can evaluate pqr -- i.e., retrieve its value from memory. After it has read the first plus sign and the 7, it can add 7 to the value it retrieved for pqr, etc.
You wrote a lexical analyzer for programming assignment 1. For programming assignment 2, you will use your lexical analyzer as a basis for writing a Java-- interpreter.
Errors
In addition to determining whether a statement is legal, the parser should
print error messages when it encounters a statement it cannot parse. For
example, the statement
xyz 2;
should result in an error message of the form
Parse error: expecting assignment operator
because after ``xyz'', the only legal input would be an assignment operator. The errors you should report are
After you encounter an error, your program can stop parsing and exit. You need not attempt to recover from the error and continue parsing.
Symbol Table
In order to keep track of the variables, you'll need to create a symbol
table. A symbol table stores the names of variables together with the addresses
they've been assigned in memory. When a new variable is assigned a value, you
should add an entry to the end of the symbol table, and assign it the next
available address. The ith variable referred to in the source code should be
assigned the address i in memory. Note that all C-- programs are restricted to
having less than 10 variables.
Main Memory
Main memory should be coded as class with a data member which is an array of
ten integers.
Reading from a File
Your program should get its input from a file. The user should be allowed to
input any file name at the beginning of the program. Your program should then
read the entire contents of the file into the InputString text (i.e., you need
to modify your readInput method of your lexer)
Program Execution
The interpreter should read the input from the file, then begin parsing and
executing statements. For each valid assignment statement, your program will
1) add an entry in the symbol table, if the variable on the left hand side has
not been referred to before, 2) evaluate the right-hand side of the assignment
and store the result in a memory location associated with the left-hand side
variable. If an invalid statement is encountered, your program should print an
appropriate error message and exit. If all the code in the input file is
syntactically correct, your program should print out the symbol table and main
memory:
If the input file contains the following statements:
x = 5 + 3;
x = x+1
z = x+10+1;
y = z+x;
Your program should output:
The symbol table contains:
id address
x 0
z 1
y 2
Main Memory contains:
cell value
0 9
1 20
2 29
3 0
4 0
...
Ignore End-Lines
Your program should treat end-line characters as whitespace and thus handle
multiple assignments on a single line and a single assignment on multiple lines,
as in:
x = 3; y = x+4; z =
x+1;
Program Design
You will need three new classes, Symbol, SymbolTable, and MainMemory.
SymbolTable can be designed as an array of Symbol. Each symbol should
include a string representing the name of the variable and an int representing
the address in memory that the variable represents.
Parsing should be implemented using a process called recursive descent parsing. With this scheme, you define a method for each syntactic element that you might be expecting. Generally, each of these methods is called parseX, where X is the syntactic element expected. So in your program, you'll define a parseStatement method that checks for a single complete programming statement, a parseId method, that checks to see if the first token is an id, a parseAssignmentOp method that checks for "=", and a parseExpression method that checks for a sequence of tokens which make up an expression. Each of these parse methods will call your getNextToken method one or more times.
Part A. In Part A, concentrate on checking a single statement for syntactic correctness -- i.e., parsing. In other words, have your first version simply check a statement read in from the keyboard for correctness and print a message stating if the statement is a legal one or not. In this first part, do not worry about reading from a file or about actually interpreting statements (no symbol table or main memory yet).
Part B. In Part B, you'll read input from a file, and you'll extend your parser so that it also interprets statements, i.e., builds a symbol table and main memory.
Grading
You will only be given credit for portions of the project that are fully completed.
| TASK | Points |
| Prelim Structure Chart/Updated Chart at End | 5 |
| Unit tests for all new methods | 10 |
| Documentation, formatting, quality | 10 |
| Working Program, part A | 15 |
| Complete Working Program, Parts A and B | 60 |
| EXTRA CREDIT: Generate psuedo-assembly code and interpret that code | 10 |
| EXTRA CREDIT: Handle floating point, multiple operators (-,*/), other statements | 1-10 |
Documentation, formatting, and quality will also be graded on assignment 2:
Submission.
You must submit your source code to the cs112 submission directory by the
due date. Bring a hard-copy of your program to your interactive grading session,
which will occur at the lab following the due date.
Collaboration
It is OK for you to discuss solutions to this program with your classmates.
However, no collaboration should ever involve looking at one of your
classmate's source programs! It is usually extremely easy to determine that
someone has copied a program, even when the individual doing the copying has
changed identifiers and comments. If we discover that someone has copied a
program, the authors of both programs will receive a 0 on the project
and University sanctions.