CS 110: Final Project Part A

Lexer

Due TBA

A Lexer program reads in a stream of characters and attempts to identify valid words, or tokens, within the stream. Your task is to write a Lexer program that identifies the valid tokens in one line of programming code. Your Lexer should identify Java identifiers, integers, the assignment-operator, the plus-sign, and the semi-colon. All other symbols should be identified as unknown symbols.

After reading in a string from the keyboard, your program should list, in order, the tokens that it identifies. When it processes the end-of-string character, it should list a final token, the ENDOFPROGRAM token. For example, if the string

X12 342=+; % z

is input, your lexer should output:

Token Type             Sample Token Value

IDENTIFIER             X12
INTEGER                342
EQUALSIGN           =
PLUSSIGN              +
SEMICOLON           ;
UNKNOWN SYMBOL  %
IDENTIFIER             z
ENDPROGRAM

 For the string: xxxyyy= 12++# x       y1,

your program should output:

 Token Type             Token Value  

IDENTIFIER            xxxyyy
 
EQUALSIGN          =
 INTEGER                12
 PLUSSIGN             +
 PLUSSIGN             +
 UNKNOWN SYMBOL  #
 IDENTIFIER             x
 IDENTIFIER             y1
 ENDPROGRAM

Note that whitespace (space, tab, end-of-line) is not considered a token, but is considered as a delimiter between tokens. Your program should handle whitespace correctly.

Design Requirements     Conform to the following requirements for full credit...

Define and use two classes:  ProgramStatement and Token.  ProgramStatement contains the string input by the user and an index that tracks which character is currently being analyzed. Token contains a token type and token value (the left and right columns above in the table shown above).Program Statement should contain the methods readInput, getNextToken, getId, getInt, and getSymbol. Both classes should contain appropriate constructors and toString methods.

As you design your program, try to understand the purpose of these methods and verify with your  professor that you are on the right track.Do NOT store an array of tokens. After a token is identified, print it out then get the next one.

Extra Credit

Accept real numbers (e.g., 42.543) as valid tokens (3 points)

Instead of reading text from the user, read it from a file (3 points)