Assigned: Monday, September 12
Program Due: Wednesday, September 28.
In this project, you'll write the first half of a simple
interpreter. This interpreter will allow the user to enter arithmetic
expressions (including variables), evaluate these expressions, and
print out the result. Here's some example output:
[brooks@valis ~] java interpreter
>>> 3 + 4 - 5 + 6
>>> x = 5 - 4
>>> y = 4 + 6 + x + 3 - 7 + 1
The problem of building an interpreter can be broken into two pieces:
identifying the pieces of an expression and then evaluating the
expression. In project 1 you'll build the first part, often referred
to as a lexer.
A lexer is the initial component of a compiler. It takes a stream of
characters and breaks them into tokens. A token can be
thought of as the basic unit or building block of a program.
In this project, wou will write a Lexer that can input a single line
containing an expression from System.in and separate it into
tokens. Your lexer should be able to recognize the following sorts of
- Identifiers: A letter followed by zero or more numbers or letters. For
example, v1, xj23, myvar, and x are all valid identifiers.
- Integers. A series of 1 or more numbers.
- The assignment operator: =
- The addition operator: +
- The subtraction operator
- All other tokens should be identified as 'unknown'
Your program should read in a string and print out the tokens found in
order of appearance, along with their type.
Here is an example:
Let's say the input was: X12 342=+ % z
Your output would be:
If the input was: xxxyyy= 12+-# x 1y
Your output would be:
Some things to notice:
- Whitespace is not a token. It's just used to separate tokens.
- Integers follwed by letters are unknown tokens.
- For this program, we don't care if the order of the tokens makes
any sense. (We'll get to that in project 2.) We just want to identify
- We're not going to worry about other math operators (*, /) -
just plus and minus.
You must conform to the following requirements for full credit:
- Program the Lexer in Java.
- Your program should consist of three classes, each defined in a
separate class file:
- Lexer.java. This should contain your main method, plus some
- InputString.java: This will be responsible for managing the input
string. It will contain an index that indicates the character
currently being analyzed. It should have the following methods:
- Token.java: This will contain the Token class. The InputString
will create Tokens that will be returned by getNextToken. Tokens
should contain a token type (identifier, equalssign, etc) and a token
value (x12, y1, etc). It should also have setter and getter methods
for the token type and value. The token class should also contain a
toString() method. The various token types should be defined as
constants within the Token class using 'static final' string variables.
- Do not store the tokens in an array for later use. Your program
just needs to sequentially process them, print them out, and then
- You should build this program in a bottom-up fashion. For each
method you write, you must write a test or driver method that
allows you to test it in isolation.
You will only be given credit for portions of the project that are
fully completed, i.e. having a few methods with unit tests that
execute correctly is better than having source code for the entire
program that doesn't run at all.
No credit will be given for source code that does not
compile/run. "Almost working" does not count.
You should complete the following tasks in this order:
- readInput method in inputString, with unit test. 10 points.
- printToken method in Token class, with unit test. 10 points.
- getInt method in InputString classwith unit test 10 points.
- getId method in InputString class with unit test 10 points.
- getNextToken method in InputString class with unit test. 10 points.
- Complete Working Program, as specified above. 50 points.
You must write drivers for all methods in your program
(except main). Note that a working program, without driver methods
will be worth only 50 points.
*** EXTRA CREDIT ***
You may do any or all of the extra credit detailed below. I will not
accept extra credit for incomplete programs. In other words, you have
to do the required parts first.
- Accept all mathematical operators (+,-,*,/) as valid tokens: 2 points.
- Accept real numbers (e.g., 42.543) as a valid token: 3 points.
- Read in the single line of input from a file, rather than from
System.in: 2 points.
- Read in a sequence of lines of input from a file and process each
line: 3 points.