CS 112

Lexer Assignment

Assigned: Monday, September 12

Program Due: Wednesday, September 28.


In this project, you'll write the first half of a simple interpreter. This interpreter will allow the user to enter arithmetic expressions (including variables), evaluate these expressions, and print out the result. Here's some example output:
[brooks@valis ~] java interpreter
>>> 3 + 4 - 5 + 6
8
>>> x = 5 - 4
1
>>> y = 4 + 6 + x + 3 - 7 + 1
8
The problem of building an interpreter can be broken into two pieces: identifying the pieces of an expression and then evaluating the expression. In project 1 you'll build the first part, often referred to as a lexer.

A lexer is the initial component of a compiler. It takes a stream of characters and breaks them into tokens. A token can be thought of as the basic unit or building block of a program.

In this project, wou will write a Lexer that can input a single line containing an expression from System.in and separate it into tokens. Your lexer should be able to recognize the following sorts of tokens:

Your program should read in a string and print out the tokens found in order of appearance, along with their type.

Here is an example:

Let's say the input was: X12 342=+ % z

Your output would be:

Type         Value
IDENTIFIER   X12
INTEGER      342
EQUALSSIGN   =
PLUSSIGN     +
UNKNOWN      %
IDENTIFIER   z

If the input was: xxxyyy= 12+-# x 1y

Your output would be:

Type         Value
IDENTIFIER   xxxyyy
EQUALSSIGN   =
INTEGER      12
PLUSSIGN     +
MINUSSIGN    -
UNKNOWN      #
IDENTIFIER   x
UNKNOWN      1y

Some things to notice:


Design Requirements

You must conform to the following requirements for full credit:

  1. Program the Lexer in Java.
  2. Your program should consist of three classes, each defined in a separate class file:
    1. Lexer.java. This should contain your main method, plus some testing methods.
    2. InputString.java: This will be responsible for managing the input string. It will contain an index that indicates the character currently being analyzed. It should have the following methods:
      • readInput
      • getNextToken
      • hasMoreTokens
      • getID
      • getInt
      • getSymbol
    3. Token.java: This will contain the Token class. The InputString will create Tokens that will be returned by getNextToken. Tokens should contain a token type (identifier, equalssign, etc) and a token value (x12, y1, etc). It should also have setter and getter methods for the token type and value. The token class should also contain a toString() method. The various token types should be defined as constants within the Token class using 'static final' string variables.
  3. Do not store the tokens in an array for later use. Your program just needs to sequentially process them, print them out, and then discard them.
  4. You should build this program in a bottom-up fashion. For each method you write, you must write a test or driver method that allows you to test it in isolation.

Grading

You will only be given credit for portions of the project that are fully completed, i.e. having a few methods with unit tests that execute correctly is better than having source code for the entire program that doesn't run at all.

No credit will be given for source code that does not compile/run. "Almost working" does not count.

You should complete the following tasks in this order:

You must write drivers for all methods in your program (except main). Note that a working program, without driver methods will be worth only 50 points.

*** EXTRA CREDIT ***

You may do any or all of the extra credit detailed below. I will not accept extra credit for incomplete programs. In other words, you have to do the required parts first.